AD-A220  031 


REPORT  DOCUMENTATION  PAGE 


n  uM  8  E  o 

AIM  1180 


*  TlTLEf**</Su6fii/»! 


jirc  FF.AD  INST  FICTIONS 

*ut- _ _ BEFORE  COMPLETING  FORM 

1  GOVT  »CCtSSION  KOI  I  AECIRiEnt'S  CATALOG  NUMBER 


J  Ty»e  O'  report  *  PERIOD  COVERED 


How  To  Do  the  Right  Thing 


7 .  *UThOB/IJ 


Pattie  Maes 


Memorandum 


»  performing  org.  report  number 


•  CONTRACT  OR  GRANT  NUMBERflJ 


9  prpfQBumQ  Organization  nam£  anO  ADDRESS 

Artificial  Intelligence  Laboratory 
545  Technology  Square 

_ Cambridge,  MA  02139 _ 

II.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

Advanced  Research  Projects  Agency 
1400  Wilson  Blvd. 

_ Arlington,  VA  22209 _ 

MONITORING  aGCnCy  name  a  AOORESS<ll  d/f(*r#o i  from  C«rt<r«Hfnf  Of/lcaJ 

Office  of  Naval  Research 
Information  Systems 
Arlington,  VA  22217 

_  0|$TR|BVJT|ON  STATEMENT  (ol  IM»  TTtptt)  "  ~  “ 

Distribution  is  unlimited 


■  UTIOM  STATEMENT  (•!  iHm  aBafracf  aniaratf  In  Mine*  19,  II  ElMarani  Item  RwpwrtJ 


N00014-86-K-0124 
N000 14-86-K-0685 


10  program  Element  projEC'  task 
area  a  rork  unit  numbers 


12.  REPORT  DATE 

October  1989 

~  number  of  pages 

51 _ 

It.  SECURITY  CLASS  lol  IMi  rwport; 

UNCLASSIFIED 


fia.  OE  CLASSIFICATION/ DOWN  GRADING 
SCHEDULE 


DTIC 

i  ELECTE1 
APR0  2 19901 

*An  D  1 


6 


EMENTARY  notes 


)ROS  fCanllnua  an  ra»af«a  afEa  II  nacaaaatr  mt4  Utility  9y  Slwc*  mm*t) 

m  Selection  Control  Architectures  '  ■/ - /«■  <>  ^  ^  *  - ' 

,  iaiining  Spreading  Activation  Algorithms- ~  — 

Autonomous  Agents 


20.  abstract  (Ctiilnut  an  raaaraa  alBa  II  nacaaaarr  ana  Utility  9y  hint*  numktr) 

This  paper  presents  a  novel  approach  to  the  problem  of  action  selection 
for  an  autonomous  agent.  An  agent  is  viewed  as  a  collection  of  competence 
modules.  Action  selection  is  modelled  as  an  emergent  property  of  an  activa- 
t ion /inhibition  dynamics  among  these  modules.  A  concrete  action  selection 
algorithm  is  presented  and  a  detailed  account  of  the  results  is  given.  This 

algorithm  combines  characteristics  of  both  traditional  planners  and  reactive 

(continued  aa_.back) 


DD  1473  EDITION  or  I  NOV  •»  IS  OSSOLETE 

“  S/N  0I0J-0I4-  SSOI  I 


UNCLASSIFIED 

security  classification  or  this  pace  /*»>•"  o» *•  imtt 


Approved  lor  public  rateaaa; 
Dl*trii>uttem  UtiHtottad 


Block  20 


continued : 


MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
ARTIFICIAL  INTELLIGENCE  LABORATORY 


A.I.  Memo  No.  1180 


December  1989 


How  To  Do  the  Right  Thing 

Pattie  Maes 


Abstract 

This  paper  presents  a  novel  approach  to  the  problem  of  action  selection 
for  an  autonomous  agent.  An  agent  is  viewed  as  a  collection  of  com¬ 
petence  modules.  Action  selection  is  modeled  as  an  emergent  property 
of  an  activation/inhibition  dynamics  among  these  modules.  A  con¬ 
crete  action  selection  algorithm  is  presented  and  a  detailed  account  of 
the  results  is  given.  This  algorithm  combines  characteristics  of  both 
traditional  planners  and  reactive  systems:  it  produces  fast  and  robust 
activity  in  a  tight  interaction  loop  with  the  environment,  while  at  the 
same  time  allowing  for  some  prediction  and  planning  to  take  place.  It 
provides  global  parameters,  which  one  can  use  to  tune  the  action  selec¬ 
tion  behavior  to  the  characteristics  of  the  task  environment.  As  such 
one  can  smoothly  trade  off  goal-orientedness  for  situation-orientedness, 
bias  towards  ongoing  plans  (inertia)  for  adaptivity,  thoughtfulness  for 
speed,  and  adjust  its  sensitivity  to  goal  conflicts. 
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1  Introduction 


This  paper  addresses  the  following  problem.  Imagine  an  autonomous  agent  which 
has  to  achieve  a  number  of  global  goals  in  a  complex  dynamic  environment.  An 
example  could  be  a  rover  that  has  to  explore  Mars  and  collect  samples  of  soil.  How 
can  such  an  agent  select  ‘the  most  appropriate’  or  ‘the  most  relevant’  next  action 
to  take  at  a  particular  moment,  when  facing  a  particular  situation?  Important 
constraints  are  that  the  world  is  too  complex  to  be  entirely  predictable  and  that 
the  agent  has  limited  computational  resources  and  limited  time  resources.  This 
implies  that  the  action  selection  cannot  be  completely  ‘rational’  or  optimal.  It 
should,  however,  be  robust,  fast,  and  make  ‘good  enough’  decisions  (Simon,  1955). 
By  ‘good  enough’  we  mean,  among  other  things,  that  the  action  selection  behavior 
should  demonstrate  the  following  characteristics: 


•  it  favors  actions  that  are  goal-oriented,  in  particular,  actions  that  contribute 
to  several  goals  at  once, 

•  it  favors  actions  that  are  relevant  to  the  current  situation,  in  particular  it 
exploits  opportunities  and  is  highly  adaptive  to  unpredictable  and  changing 
situations, 

•  it  favors  actions  that  contribute  to  the  ongoing  goal/plan  (unless  another 
action  rates  a  lot  better),  i.e.,  it  ‘sticks’  onto  a  particular  goal  unless  there  is 
a  good  reason  to  start  working  on  something  different. 


•  it  looks  ahead  (or  ‘plans’),  in  particular  to  avoid  hazardous  situations  and 
handle  interacting  and  conflicting  goals, 


•  it  is  robust  (never  completely  breaks  down),  even  when  certain  components 
fail, 

•  and  it  is  reactive  and  fast. 


The  paper  studies  this  problem  in  the  context  of  the  Society  of  the  Mind  the¬ 
ory  (Minsky,  1986)  to  which  the  Subsumption  Architecture  (Brooks,  1986)  is  also 
related.  This  theory  suggests  the  building  of  an  intelligent  system  as  a  society 
of  interacting,  mindless  agents,  each  having  their  own  specific  competence.  For 
example,  a  society  of  agents  that  is  able  to  build  a  tower  would  incorporate  ‘com¬ 
petence  modules’  for  finding  a  block,  for  grasping  a  block,  for  moving  a  block,  etc. 
The  idea  is  that  competence  modules  cooperate  (locally)  in  such  a  way  that  the 
society  as  a  whole  functions  properly.  Such  an  architecture  is  very  attractive  be¬ 
cause  of  its  distributedness,  modular  structure,  emergent  global  functionality  and 
robustness.  _ 


cn  Fop 

lAkl 

til/ 

4 

iced 
-*at  loo. 


4 


Distribution/ _ 

Availability  Codea 
Thrall  and/or 
Dlat  Special 


□  □ 


One  of  the  open  problems  is  how  action  can  be  controlled  in  such  a  distributed 
system.  More  specifically:  (i)  how  is  it  determined  whether  or  not  some  compe¬ 
tence  module  should  become  active  (take  some  real  world  actions  by  controlling 
the  effectors)  at  a  specific  moment,  and  (ii)  what  are  the  factors  that  determine 
cooperation  among  certain  competence  modules.  Several  solutions  can  be  adopted. 
One  approach  is  to  hand-code  (and  by  that  hard- wire)  the  control  flow  among  the 
competence  modules  (Brooks,  1986).  Another  approach  is  to  introduce  a  hierar¬ 
chical  structure  to  tell  competence  modules  whether  they  are  allowed  to  perform 
an  action  or  not.  This  paper  investigates  yet  another,  entirely  different  type  of 
solution. 

The  hypotheses  that  are  tested  are: 

•  ‘good  enough’  action  selection  of  the  global  system  can  be  obtained  by  letting 
the  competence  modules  activate  and  inhibit  each  other  in  the  right  way, 

•  no  ‘bureaucratic’  competence  modules  are  necessary  (i.e.,  modules  whose 
only  competence  is  determining  which  other  modules  should  be  activated  or 
inhibited)  nor  do  we  need  global  forms  of  control. 

We  are  studying  the  adequacy  of  these  hypotheses  are  attempting  to  determine 
which  activation/inhibition  dynamics  is  appropriate.  To  this  end  we  are  develop¬ 
ing  a  series  of  algorithms  and  testing  them  in  computer  simulations.  One  such 
algorithm  was  discussed  in  (Maes,  1989).  This  paper  describes  a  variation  on  the 
algorithm  which  is  simpler  and  produces  more  interesting  results1. 

Experiments  have  been  performed  for  several  applications.  The  resulting  sys¬ 
tems  do  exhibit  the  desired  properties  of  goal-orientedness,  si tuat ion-orient edness, 
adaptivity,  robustness,  looking  ahead,  etc.  Further,  global  parameters  make  it 
possible  to  smoothly  mediate  between  these  action  selection  criteria,  such  as  trad¬ 
ing  off  goal-orientedness  for  data-orientedness,  adaptivity  for  inertia,  sensitivity  to 
goal  conflicts  and  thoughtfulness  for  speed. 

One  cannot  classify  this  algorithm  as  either  belonging  to  the  traditional  AI  ap¬ 
proach  (in  which  competence  is  programmed)  or  to  the  connectionist  approach  (in 
which  competence  is  the  result  of  tabula  rasa  learning).  Nor  is  it  a  hybrid  system 
in  the  sense  that  there  would  be  a  distinct  symbolic  and  subsymbolic  component. 
Instead,  the  algorithm  completely  integrates  characteristics  of  both  approaches  by 
using  a  connectionist  computational  model  on  a  symbolic,  structured  representa¬ 
tion.  By  doing  so,  it  combines  the  best  of  both  worlds: 

•  From  connectionism  it  inherits  the  interesting  properties  of  intrinsic  par¬ 
allelism,  fault-tolerance,  sophisticated  retrieval  and  matching  capabilities, 

1In  particular,  this  algorithm  also  makes  use  of  ‘inhibition’  among  modules,  which  makes 
it  possible  to  deal  with  interacting  goals.  Further,  there  are  new  results  on  how  the  global 
parameters  can  be  used  to  tune  the  action  selection  behavior  along  different  dimensions. 
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density  (or  continuity)  and  global  emergent  computation  from  uniform  local 
interaction  rules.  On  the  other  hand,  it  avoids  putting  the  whole  burden  on 
learning  and  classification  (without  excluding  the  possibility  of  applying  the 
learning  techniques  developed  in  this  area). 

•  From  symbolic  AI,  it  adopts  representation  and  structuring  principles.  The 
network  is  prewired,  its  links  have  specific  meanings  which  can  be  understood 
(such  as  causality)  and  nodes  are  large,  meaningful  units.  Thus,  the  algo- 
rithm  inherits  such  interesting  properties  as  explanation  facilities  and  pro¬ 
grammability  (the  network  can  be  augmented  by  hand).  It  further  provides  a 
compositional  solution  to  the  problem  of  action  selection,  which  means  that 
the  same  parts  are  reused  for  different  problems  (e.g.  the  same  network  can 
be  given  different  goals  at  different  times).  As  a  consequence,  the  networks 
are  smaller  (and  therefore  might  prove  to  be  easier  to  learn  or  improve). 
On  the  other  hand,  the  algorithm  avoids  problems  of  traditional  AI  solu¬ 
tions  such  as  seriality /slowness,  brittleness,  rigidity,  and  the  communication 
complexity  of  distributed  AI  systems. 

This  paper  is  structured  as  follows:  section  2  introduces  the  algorithm  for 
action  selection,  section  3  presents  a  mathematical  model,  section  4  sketches  how 
it  works,  section  5  discusses  the  empirical  results  obtained,  section  6  reflects  on 
the  limits  of  the  current  algorithm,  section  7  compares  the  algorithm  with  related 
work,  and  finally,  section  8  draws  some  conclusions. 

2  Algorithm 

An  autonomous  agent  is  viewed  as  a  set  of  competence  modules.  These  competence 
modules  resemble  the  operators  of  a  classical  planning  system.  A  competence 
module  i  can  be  described  by  a  tuple  (c*,  a*,  d,,  a,),  c,-  is  a  list  of  preconditions 
which  have  to  be  fulfilled  before  the  competence  module  can  become  active.  ai 
and  di  represent  the  expected  effects  of  the  competence  module’s  action  in  terms 
of  an  add  list  and  a  delete  list.  In  addition,  each  competence  module  has  a  level 
of  activation  a j.  A  competence  module  is  executable  at  time  t  when  all  of  its 
preconditions  are  observed  to  be  true  at  time  t.  An  executable  competence  module 
whose  activation-level  surpasses  a  threshold  may  be  selected,  which  means  that  it 
performs  some  real  world  actions.  The  operation  of  a  competence  module  (what 
computation  it  performs,  what  actions  it  takes  and  how)  is  not  made  explicit, 
i.e.,  competence  modules  could  be  hard- wired  inside,  they  could  perform  logical 
inference,  or  whatever. 

Competence  modules  are  linked  in  a  network  through  three  types  of  links: 
successor  links,  predecessor  links,  and  conflicter  links.  The  description  of  the 
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competence  modules  of  an  autonomous  agent  in  terms  of  a  precondition  list,  add 
list  and  delete  list  completely  defines  this  network: 

•  There  is  a  successor  link  from  competence  module  x  to  competence  module 
y  (‘x  has  y  as  successor’)  for  every  proposition  p  that  is  a  member  of  the 
add  list  of  x  and  also  member  of  the  precondition  list  of  y  (so  more  than 
one  successor  link  between  two  competence  modules  may  exist).  Formally, 
given  competence  module  x  =  (ca,  ax,  dx,  az)  and  competence  module  y  = 
(cj,,a y,dy,ay),  there  is  a  successor  link  from  x  to  y,  for  every  proposition 
P  G  fl  Cy. 

•  A  predecessor  link  from  module  x  to  module  y  (‘x  has  y  as  predecessor’) 
exists  for  every  successor  link  from  y  to  x.  Formally,  given  competence 
module  x  =  (cx,ax,dx,  a*)  and  competence  module  y  =  (cv,ay,dy,av),  there 
is  a  predecessor  link  from  x  to  y,  for  every  proposition  p  €  cx  fl  ay. 

•  There  is  a  conflicter  link  from  module  x  to  module  y  (‘y  conflicts  with 
x’)  for  every  proposition  p  that  is  a  member  of  the  delete  list  of  y  and  a 
member  of  the  precondition  list  of  x.  Formally,  given  competence  module 
x  =  (cx,ax,dx,ax)  and  competence  module  y  =  ( cy,ay,dy,ocy ),  there  is  a 
conflicter  link  from  x  to  y,  for  every  proposition  p  e  cx  n  dy. 

The  intuitive  idea  is  that  modules  use  these  links  to  activate  and  inhibit  each 
other,  so  that  after  some  time  the  activation  energy  accumulates  in  the  modules 
that  represent  the  ‘best’  actions  to  take  given  the  current  situation  and  goals.  Once 
the  activation  level  of  such  a  module  surpasses  a  certain  threshold,  and  provided  the 
module  is  executable,  it  becomes  active  and  takes  some  real  actions.  The  pattern 
of  spreading  activation  among  modules,  as  well  as  the  input  of  new  activation 
energy  into  the  network  is  determined  by  the  current  state  of  the  environment  and 
the  current  global  goals  of  the  agent: 

•  Activation  by  the  State 

There  is  input  of  activation  energy  coming  from  the  state  of  the  environment 
towards  modules  that  partially  match  the  current  state3.  A  competence 
module  is  said  to  partially  match  the  current  state  if  at  least  one  of  its 
preconditions  is  observed  to  be  true. 

•  Activation  by  the  Goals 

A  second  source  of  activation  energy  is  the  global  goals  of  the  agent.  They 

J  Notice  that  we  do  not  make  the  assumption  that  there  is  a  global  continuously  updated 
world  model.  In  a  real  robot,  each  proposition  would  be  delivered  by  a  virtual  sensor,  which  is  a 
module  that  decides  upon  the  basis  of  real  sensor  data  whether  a  certain  proposition  should  be 
considered  true. 
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increase  the  activation  level  of  modules  that  achieve  one  of  the  global  goals. 
A  module  is  said  to  achieve  one  of  the  global  goals  if  one  of  the  goals  is  a 
member  of  the  add  list  of  the  competence  module.  Notice  that  we  distinguish 
two  types  of  goals:  once-only  goals  have  to  be  achieved  only  once,  i.e.  as  soon 
as  they  are  achieved,  they  are  deleted  from  the  list  of  global  goals. Permanent 
goals  have  to  be  achieved  continuously.  An  example  of  the  first  is  the  goal 
‘spray-paint-car’,  an  example  of  the  second  would  be  ‘battery-50 

•  Inhibition  by  the  Protected  Goals 

Further,  there  is  an  external  inhibition  (or  removed  of  activation)  by  the 
global  goals  of  the  agent  that  have  already  been  achieved  and  should  be 
protected.  These  ‘protected  goals’  remove  some  of  the  activation  from  the 
modules  that  would  undo  them.  A  module  is  said  to  undo  one  of  the  pro¬ 
tected  goals  when  one  of  the  protected  goals  is  member  of  the  delete  list  of 
the  module. 

These  processes  are  continuous:  there  is  a  continual  flow  of  activation  energy 
towards  the  modules  that  partially  match  the  current  state  and  towards  the  mod¬ 
ules  that  realize  one  of  the  global  goals  (at  every  timestep  their  activation  levels 
are  increased).  There  is  a  continual  decrease  of  the  activation  level  of  the  modules 
that  undo  the  protected  goals.  This  means  that  the  state  of  the  environment  and 
the  global  goals  may  change  unpredictably  at  any  moment  in  time.  If  this  happens, 
the  external  input  of  activation  automatically  flows  to  other  competence  modules. 

Besides  the  impact  on  activation  levels  from  the  state  and  goals,  competence 
modules  also  activate  and  inhibit  each  other.  Modules  spread  activation  along 
their  links  as  follows: 

•  Activation  of  Successors 

An  executable  competence  module  x  spreads  activation  forward.  It  increases 
(by  a  fraction  of  its  own  activation  level)  the  activation  level  of  those  succes¬ 
sors  y  for  which  the  shared  proposition  p  6  a*  fl  cv  is  not  true.  Intuitively, 
we  want  these  successor  modules  to  become  more  activated  because  they  are 
‘almost  executable’,  since  more  of  their  preconditions  will  be  fulfilled  after 
the  competence  module  has  become  active.  Formally,  given  that  competence 
module  *  =  (c„,  o»,  d*,  ax)  is  executable,  it  spreads  forward  through  those 
successor  links  for  which  the  proposition  that  defines  them  p  E  ax  is  false. 

•  Activation  of  Predecessors 

A  competence  module  x  that  is  not  executable  spreads  activation  backward. 
It  increases  (by  a  fraction  of  its  own  activation  level)  the  activation  level  of 
those  predecessors  y  for  which  the  shared  proposition  p  £  cx  fl  ay  is  not  true. 
Intuitively,  a  non-executable  competence  module  spreads  to  the  modules 
that  ‘promise’  to  fulfill  its  preconditions  that  are  not  yet  true,  so  that  the 
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competence  module  may  become  executable  afterwards.  Formally,  given  that 
competence  module  x  =  (cx,  ax,  dx,  ax)  is  not  executable,  it  spreads  backward 
through  those  predecessor  links  for  which  the  proposition  that  defined  them 
p  €  c*  is  false. 

•  Inhibition  of  Conflicters 

Every  competence  module  x  (executable  or  not)  decreases  (by  a  fraction  of 
its  own  activation  level)  the  activation  level  of  those  conflicters  y  for  which 
the  shared  proposition  p  6  c*  fl  dy  is  true.  Intuitively,  a  module  tries  to 
prevent  a  module  that  undoes  its  true  preconditions  from  becoming  active. 
Notice  that  we  do  not  allow  a  module  to  inhibit  itself  (while  it  may  activate 
itself).  In  case  of  mutual  conflict  of  modules,  only  the  one  with  the  higher 
activation  level  inhibits  the  other.  This  prevents  the  phenomenon  that  the 
most  relevant  modules  eliminate  each  other.  Formally,  competence  module 
x  =  (cx,  ax,  dx,  ax)  takes  away  activation  energy  through  all  of  its  conflicter 
links  for  which  the  proposition  that  defines  them  p  6  cx  is  true,  except  those 
links  for  which  there  exists  an  inverse  conflicter  link  that  is  stronger. 

The  global  algorithm  performs  a  loop,  in  which  at  every  timestep  the  following 
computation  takes  place  over  all  of  the  competence  modules: 

1.  The  impact  of  the  state,  goals  and  protected  goals  on  the  activation  level  of 
a  module  is  computed. 

2.  The  way  the  competence  module  activates  and  inhibits  related  modules 
through  its  successor  links,  predecessor  links  and  conflicter  links  is  computed. 

3.  A  decay  function  ensures  that  the  overall  activation  level  remains  constant. 

4.  The  competence  module  that  fulfills  the  following  three  conditions  becomes 
active:  (i)  It  has  to  be  executable,  (ii)  Its  level  of  activation  has  to  surpass 
a  certain  threshold  and  (iii)  It  must  have  a  higher  activation  level  than  all 
other  competence  modules  that  fulfill  conditions  (i)  and  (ii).  When  two 
competence  modules  fulfill  these  conditions  (i.e.,  they  are  equally  strong), 
one  of  them  is  chosen  randomly.  The  activation  level  of  the  module  that  has 
become  active  is  reinitialized  to  0  3.  If  none  of  the  modules  fulfills  conditions 
(i)  and  (ii),  the  threshold  is  lowered  by  10%. 

These  four  steps  are  repeated  infinitely.  Interesting  global  observable  properties 
are:  the  sequence  of  competence  modules  that  have  become  active,  the  optimality 
of  this  sequence  (which  is  computed  by  a  domain-dependent  function),  and  the 

3If  this  were  not  the  case,  modules  could  become  active  a  couple  of  times  in  a  row  without 
this  really  being  desirable. 


6 


speed  with  which  it  was  obtained  (the  number  of  timesteps  a  competence  module 
has  become  active  relative  to  the  total  number  of  timesteps  the  system  has  been 
running). 

Four  global  parameters  can  be  used  to  ‘tune’  the  spreading  activation  dynamics, 
and  thereby  the  action  selection  behavior  of  the  agent: 

1.  6,  the  threshold  for  becoming  active,  and  related  to  it,  n  the  mean  level  of 
activation.  0  is  lowered  with  10%  each  time  none  of  the  modules  could  be 
selected.  It  is  reset  to  its  initial  value  when  a  module  could  be  selected. 

2.  <p,  the  amount  of  activation  energy  a  proposition  that  is  observed  to  be  true 
injects  into  the  network. 

3.  7,  the  amount  of  activation  energy  a  goal  injects  into  the  network. 

4.  8 ,  the  amount  of  activation  energy  a  protected  goal  takes  away  from  the 
network. 

These  parameters  also  determine  the  amount  of  activation  that  modules  spread 
forward,  backward  or  take  away.  More  precisely,  for  each  false  proposition  in  its 
precondition  list,  a  non-executable  module  spreads  a  to  its  predecessors.  For  each 
false  proposition  in  its  add  list,  an  executable  module  spreads  to  its  successors. 
For  each  true  proposition  in  its  precondition  list  a  module  takes  away  a ^  from 
its  conflictors.  These  factors  were  chosen  this  way  because  the  internal  spreading 
of  activation  should  have  the  same  semantics/effects  as  the  input /output  by  the 
state  and  the  goals.  The  ratios  of  input  from  the  state  versus  input  from  the 
goals  versus  output  by  the  protected  goals  are  the  same  as  the  ratios  of  input  from 
predecessors  versus  input  from  successors  versus  output  by  modules  with  which  a 
module  conflicts.  Intuitively,  we  want  to  view  preconditions  that  are  not  yet  true 
as  subgoals,  effects  that  are  about  to  be  true  as  ‘predictions’,  and  preconditions 
that  are  true  as  protected  subgoals. 

The  algorithm  as  it  is  described  until  now,  has  a  drawback  that  has  to  be  dealt 
with.  The  length  of  a  precondition  list,  add  list  or  delete  list  affects  the  input 
and  output  of  activation  to  a  module.  In  particular,  a  module  which  has  a  lot 
of  propositions  in  its  add  list  and  precondition  list  has  more  sources  of  activation 
energy  than  a  module  that  only  has  a  few.  Therefore,  all  input  of  activation  to  a 
module  or  removal  of  activation  from  a  module  is  weighted  by  where  n  is  (i)  the 
number  of  propositions  in  the  precondition  list  (in  the  case  of  input  coming  from 
the  state  and  from  the  predecessors),  (ii)  the  number  of  propositions  in  the  add- 
list  (in  the  case  of  input  from  the  goals  or  from  successors),  or  (iii)  the  number  of 
propositions  in  the  delete  list  (in  the  case  of  removal  of  activation  by  the  protected 
goals  or  by  modules  with  whom  the  module  conflicts). 
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Finally,  we  want  modules  that  achieve  the  same  goal  or  modules  that  use 
the  same  precondition  to  compete  with  one  another  to  become  active  (we  view 
them  as  representing  a  disjunction  or  choice  point).  Therefore,  the  amount  of 
activation  that  is  spread  or  taken  away  for  a  particular  proposition  is  split  among 
the  affected  modules.  For  example,  for  a  particular  proposition  p  that  is  observed 
to  be  true  the  state  divides  4>  among  all  of  the  modules  that  have  that  precondition 
in  their  precondition  list.  The  same  not  only  holds  for  the  effect  of  the  goals  and 
the  protected  goals,  but  also  for  the  internal  spreading  of  activation.  For  example 
when  a  large  number  of  modules  achieve  a  precondition  of  module  m,  the  activation 
am  that  m  spreads  backward  for  that  proposition  is  equally  divided  among  all  of 
these  modules.  When  on  the  other  hand  there  is  only  one  other  module  that 
can  make  this  precondition  true,  module  m  increases  the  activation  level  of  that 
module  by  its  own  activation  level  o^,.  One  implicit  assumption  on  which  this 
is  based  is  that  the  preconditions  are  in  conjunctive  normal  form.  A  disjunction 
of  two  preconditions  would  be  represented  by  a  single  proposition,  for  which  two 
competence  modules  exist  that  can  make  it  true. 

3  Mathematical  Model 

This  section  of  the  paper  presents  a  mathematical  description  of  the  algorithm  so 
as  to  make  reproduction  of  the  results  possible.  Given: 

•  a  set  of  competence  modules  l..n, 

•  a  set  of  propositions  P , 

•  a  function  S(t)  returning  the  propositions  that  are  observed  to  be  true  at 
time  t  (the  state  of  the  environment  as  perceived  by  the  agent);  S  being 
implemented  by  an  independent  process  (or  the  real  world), 

•  a  function  G(t)  returning  the  propositions  that  are  a  goal  of  the  agent  at 
time  t;  G  being  implemented  by  an  independent  process, 

•  a  function  R(t)  returning  the  propositions  that  are  a  goal  of  the  agent  that 
has  already  been  achieved  at  time  t ;  R  being  implemented  by  an  independent 
process  (e.g.  some  internal  or  external  goal  creator), 

•  a  function  executable(i,t),  which  returns  1  if  competence  module  i  is  exe¬ 
cutable  at  time  t  (i.e.,  if  all  of  the  preconditions  of  competence  module  z  are 
members  of  S(t)),  and  0  otherwise. 

•  a  function  M(j ),  which  returns  the  set  of  modules  that  match  proposition  j, 
i.e.,  the  modules  x  for  which  j  E  cx, 
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•  a  function  A(j ),  which  returns  the  set  of  modules  that  achieve  proposition 
j,  i.e.,  the  modules  x  for  which  j  G  ox, 

•  a  function  U{j ),  which  returns  the  set  of  modules  that  undo  proposition  j, 
i.e.,  the  modules  x  for  which  j  €  dx, 

•  7r,  the  mean  level  of  activation, 

•  0,  the  threshold  of  activation,  where  0  is  lowered  10%  every  time  no  module 
could  be  selected,  and  is  reset  to  its  initial  value  whenever  a  module  becomes 
active. 

•  4>,  the  amount  of  activation  energy  injected  by  the  state  per  true  proposition, 

•  7,  the  amount  of  activation  energy  injected  by  the  goals  per  goal, 

•  8 ,  the  amount  of  activation  energy  taken  away  by  the  protected  goals  per 
protected  goal. 


Given  competence  module  x  —  (cx,  ax,  dx,  at),  the  input  of  activation  to  module  x 
from  the  state  at  time  t  is: 


input-fromstate(x,t)  = 

j 


1  1 
#Af(j)#cx 


where  j  G  5(t)  fl  cx  and  where  #  stands  for  the  cardinality  of  a  set. 

The  input  of  activation  to  competence  module  x  from  the  goals  at  time  t  is: 


input-from-goals(x,t)  =  ]T^7 

j 


1  1 
#4(»  #ax 


where  j  G  G(t)  fl  ax. 

The  removal  of  activation  from  competence  module  x  by  the  goals  that  are  pro¬ 
tected  at  time  t  is: 


takenja.wayJbyjpTotected-goals(x,t )  =  8 


1  1 
mj)  #4 


where  j  G  E(t)  fl  dx. 

The  following  equation  specifies  what  a  competence  module  i  =  ( cx,ax,dx,ax ) 
spreads  backward  to  a  competence  module  y  =  (cy,  av,  dv,  Oy): 


spreads.bw(x,  y,  t) 


if  executable(x,t)  =  0 
0  if  executable(x,t)  =  1 


where  j  £  S(t )  A  j  G  cx  n  ay. 
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The  following  equation  specifies  what  module  x  spreads  forward  to  module  y: 


spreads-fw(x ,  y, 


0 


if  executable(x,t )  s=  1 
if  executable(x,t)  =  0 


where  j  £  S(t)  A  j  G  a*  H 

The  following  equation  specifies  what  module  x  takes  away  from  module  y: 
takes  ja.way[x,y,t)  = 

f  0  if  (aj.(t  —  l)  <  av(t  —  l))  A  (3z  6  5(t)  Pi  Cy  H  (fx) 

\  max(£j  a*(t-  l)f  - 1))  otherwise 

where  j  €  cx  D  <fy  fl  S(£). 

The  activation  level  of  a  competence  module  y  at  time  t  is  defined  as: 


a(y,0)  =  0 

a(y,t)  =  decay  (a(y  ,t  —  l)(l  —  active(y,  t  —  l)) 

+input-from.3tate(y,t)  +  input.from-goals(y,  t) 

— taken-awayJbyjpTotected-.goals{y,t ) 

+  y'ispreads-.bwjx,  y,  t)  +  $preads-fw(x,y,t)  —  takes-away(z,y,t))) 


where  x  ranges  over  the  modules  of  the  network,  z  ranges  over  the  modules  of  the 
network  minus  the  module  y,  t  >  0,  and  the  decay  function  is  such  that  the  global 
activation  remains  constant: 

£«y(0  =  n7r 

v 

The  competence  module  that  becomes  active  at  time  t  is  module  i  such  that: 


active(t,i ) 


1  if  { 


a(i,<)  >=  0 
ex  ecutable{i,t)  =  1 

Vy  fulfilling(l)  A  (2)  :  a(i,t )  >=  a(j,t) 


(1) 

(2) 

(3) 


active[t,i )  =  0  otherwise 


4  Example 

This  section  illustrates  the  algorithm  with  a  concrete,  simple  example.  Later  in 
the  paper  more  interesting  examples  are  discussed.  The  example  is  taken  from 
the  planning  chapter  of  (Charniak  &  Me  Dermott,  1985).  It  involves  a  robot  with 
two  hands  which  has  to  spray-paint  itself  and  sand  a  board.  The  task  has  some 


10 


complexity  to  it.  The  robot  has  to  coordinate  the  use  of  its  hands  or  otherwise 
be  clever  enough  to  use  a  vise  to  hold  the  board  and  perform  the  jobs  in  parallel. 
Furthermore,  it  should  perform  the  sanding  of  the  board  first,  because  once  it 
has  painted  itself,  it  is  no  longer  operational.  The  definition  of  the  competence 
modules  in  terms  of  their  precondition  lists,  add  lists  and  delete  lists  is  presented 
in  figure  1. 

On  the  basis  of  these  definitions  the  spreading  activation  network  in  figure  2 
is  constructed.  A  possible  solution  to  the  problem  would  be  to  pick  up  the  board, 
put  it  in  the  vise,  pick  up  the  sander,  sand  the  board  in  the  vise,  pick  up  the 
sprayer  and  spray  paint  itself. 

A  (computer-)  environment  has  been  built  in  which  the  behavior  of  such  a 
network  of  competence  modules  can  be  simulated.  The  program  is  written  in 
Common  LISP  on  a  SYMBOLICS  machine.  Figure  3  shows  a  bitmap  of  the  system 
simulating  the  network  described  above.  The  initial  state  of  the  environment 
is  S(0 )  =  (hand-is-empty,  hand-is- empty,  sander-somewhere,  sprayer-somewhere, 
operational,  board- somewhere),  the  initial  goals  are  G(0)  =  (board-sanded,  self- 
painted). 

It  is  also  possible  to  obtain  a  trace  showing  in  detail  how  the  spreading  acti¬ 
vation  has  evolved.  In  the  remainder  of  this  section,  we  study  the  trace  of  the 
experiment  shown  in  figure  3  in  order  to  explain  its  action  selection  behavior.  The 
activation  levels  of  the  competence  modules  are  initialized  to  zero.  At  time  1,  the 
modules  don’t  have  any  activation  energy  to  spread  yet,  so  there  is  only  the  in¬ 
put/output  from  the  state  and  goals.  Notice  that  SAND-BOARD-IN-HAND  and 
SAND-BOARD-IN-VTSE  have  to  share  the  activation  energy  coming  from  the  goal 
‘board-sanded’. 

TIME:  1 

state  of  the  environment:  (HAND-IS-EKPTY  HAND-IS-EMPTY  SANDER-SOMEWHERE 

SPRAYER-SOMEWHERE  OPERATIONAL  BOARD-SOME WHERE) 
goals  of  the  environment:  (BOARD-SANDED  SELF-PAINTED) 
protected  goals  of  the  environment :  NIL 

state  gives  PICK-OP-SANDER  an  extra  activation  of  3.3333333 
state  gives  PICK-DP-SPRAYER  an  extra  activation  of  3.3333333 
state  gives  PICK-DP-BOARD  an  extra  activation  of  3.3333333 
state  gives  PICK -DP- SANDER  an  extra  activation  of  10. 0 
state  gives  PICK-DP-SPRAYER  an  extra  activation  of  10.0 
state  gives  SPRAY-PAINT-SELF  an  extra  activation  of  3.3333333 
state  gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 
state  gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 
state  gives  PICK-DP-BOARD  an  extra  activation  of  10.0 
goals  give  SAND-BOARD-IN-HAND  an  extra  activation  of  35.0 
goals  give  SAND-BOARD-IN-VISE  an  extra  activation  of  3S.0 
goals  give  SPRAY-PAINT-SELF  an  extra  activation  of  70.0 
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(defmodule  PICK-UP -SPRITES 

: condition-list  * (sprayer- »ome where  hand-is-empty) 

: add-list  ’ (sprayer-in-hand) 

:dalata-liat  ’ ( spray er-someshere  hand- ia- empty) ) 

(def module  PICK-UP-SASDER 

: condition-list  ’ (sander-someshere  hand-is-empty) 

: add-list  ’ (sander- in- hand) 

: delete-list  ’ (sander-someshere  hand-is-empty)) 

(def module  PICK-UP-BOARD 

: condition-liat  ’ (board-someshere  hand-is-empty) 

: add-list  ’ (board-in-hand) 

: delete-list  * (board-someshere  hand-is-empty)) 

(def module  PUT-DOWH-SPRATER 

: condition-list  ’ (sprayer-in-hand) 

: add-list  • (sprayer- somewhere  hand-is-empty) 

:delete-list  '(sprayer-in-hand)) 

(def module  PUT-DOVH-SAHDER 

: condition-list  ' (sander -in-hand) 

: add- list  ’ (sander- somewhere  hand-is-empty) 

: delete-list  ’ (sander- in-hand) ) 

(def module  P0T-DOVH-BOARD 

: condition-list  ’ (board-in-hand) 

: add-list  ’ (board- some where  hand-is-empty) 

: delete-list  * (board- in-hand) ) 

(def module  SAHD-BOARD-II-HARD 

: condition- list  ’(operational  board- in-hand  sander- in-hand) 
: add-list  ’(board-sanded) 

: delete-list  ’()) 

(def module  SAHD-BOARD-IH-VISB 

-.condition-list  ’(operational  board-in-vise  sander -in-hand) 
: add-list  ’ (board-sanded) 

: delete-list  ’()) 

(defmodule  SPRAT-PAIHT-SELF 

: condition-list  ’(operational  sprayer- in-hand) 

:add-list  ’(self-painted) 

: delete-list  '(operational)) 

(defmodule  PLACE-BO ARD-II-VISE 

: condition-list  ’ (board-in-hand) 

: add-list  ’(hand-is-empty  board-in-vise) 

: delete-list  ’(board-in-hand)) 


Figure  1:  Definition  of  the  competence  modules  involved  in  the  toy  example. 
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put-domn-sprayej 


>ut-doii)n-sander 


it-doiun-board 


pick-up-sprayer 


pick-up-$and,fer_ 


pick-up-board 


spray-paint-self 


sand-board- in-hand 


place-board-in-uise 


sand-board -in -uise 


Figure  2:  The  spreading  activation  network  for  the  toy  example.  The  predecessor 
links  (from  a  competence  module  to  its  predecessors)  are  shown  as  arrows  (the 
symbol  of  an  activation  link).  The  conflicter  links  are  shown  as  inhibition  links 
(with  a  little  circle  at  the  end).  The  successor  links  are  not  shown  (there  is  a 
successor  link  in  the  inverse  direction  for  every  predecessor  link). 
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Influence  fron  goals:  ?0 
Influence  fron  state:  20 
Influence  fron  achieved  goals:  50 
Hean  activation  level:  20 
threshold:  45 


State  of  thm  fAvironatni  ' 

(  SELF  -PAINTED  SPRAYER-I M-HAND  BOARD-IN-VISE 
BOARD-SANDED  SANDER -I  M-HAND) 


do+ls  in  tAt  Bnvironnant 
MIL 


[□Act i vated:  (no- agent  no-agent  PICK  -UP -SANDER 
no -agent  PICK-UP-BOARD  no-agent 
SAND- BOARD- IN -HAND  no-agent  no-agent 
no -agent  no-agent  no-agent  no-agent 
no-agent  no-agent  no-agent 
PLACE -BOARD- IN- VISE  PICK -UP- SPRAYER 
SPRAY-PAINT -SELF) 

Opt  Inal  tty:  190-0  ?.  Speed:  31.576947  X 


COMPUTING  INFLUENCE  FROM  STATE  AMD  COALS 

COMPUTING  SPREAD I MG  OF  ACTIVATIOM 

, COMPUTING  DECAY  BY  TINE 

AGENT  BECOMING  ACTIVE:  SPRAY-PAINT-SELF 
,AA  Conn end: 


IPLACE-BGRRD-IN-UISE 


[SPRAY -PAINT -SELF 


[SAND-BOARD- 1  M-HRflD 


[SAMP -BOARD- IN- VISE 


(PICK -UP -SANDER 


PICK -UP -SPRAYER 


IP  I CK -UP -BOARD 


PUT -DOWN-SPRAYER 


PUT -DOWN -SANDER 


PUT -DOWN-BOARD 


Figure  3:  The  user  interface  of  the  simulation  environment.  The  upper  pane  is 
a  menu  of  commands.  It  makes  it  possible  to  define  a  new  network,  to  initialize 
the  current  network,  to  change  the  global  parameters,  to  change  the  state  of  the 
environment,  to  change  the  goals  of  the  network  and  to  run  or  step  through  the 
behavior  of  a  network.  The  left-hand  panes  display  the  parameters,  the  current 
state  of  the  environment,  the  current  goals  of  the  network  and  the  results  of  the 
simulation  (among  which  is  the  list  of  activated  modules).  The  right-hand  panes 
display  the  activation  levels  of  competence  modules  over  time  (the  X-axis  repre¬ 
sents  time,  while  the  Y-axis  displays  the  activation  level).  The  little  circles  tell 
when  a  competence  module  has  become  active. 
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activation- levels  of  modules  aftar  decay: 
activation-level  PLACE-BOARD-IN-VISE:  0.0 
activation- level  SPRAT-PAIKT-SELF :  73.333336 
activation-level  SAND-BOARD-IH-HAND :  37.22222 
activation- level  SAND-BOARD-IN-VISE:  37.22222 
activation-level  PICK-UP-SANDER:  13.333333 
activation- level  PICK-UP-SPRAYER :  13.333333 
activation-level  PICK-UP-BOARD :  13.333333 
activation- level  PUT-DOWH-SPRATER :  0.0 
activation-level  PUT-DOWN-SANDER :  0.0 
activation- level  PUT-D0VN-B0ARD :  0.0 

BO  NODULE  becoming  active 
threshold  is  lowered  to  40. 6 

None  of  the  executable  modules  has  accumulated  enough  activation  to  become 
active.  As  a  result  the  threshold  is  lowered  by  10%.  At  time  2,  the  input/output 
from  the  state  and  goals  is  the  same  as  at  time  1  (not  reprinted).  Now  there  is  also 
some  spreading  activation  among  modules.  Notice  that  the  modules  that  match 
the  goals,  SPRAY-PAINT-SELF,  SAND-BOARD-IN- VISE  and  SAND-BOARD- 
IN-HAND  spread  backwards  to  their  predecessors  PICK-UP-SPRAYER,  PICK- 
UP-SAN-DER,  PICK-UP-BOARD,  and  PL  ACE-BOARD-IN- VISE  to  make  their 
conditions  true.  So  the  false  preconditions  of  the  modules  that  achieve  the  goals 
are  treated  as  ‘subgoals’  by  the  algorithm. 

In  case  there  is  only  one  predecessor  for  a  false  precondition,  they  increase 
that  module’s  activation  level  with  their  own  activation  level.  For  example,  PICK¬ 
UP-SPRAYER  receives  as  much  activation  as  what  SPRAY-PAINT-SELF  has, 
because  it  is  the  only  module  that  achieves  the  precondition  ‘sprayer-in-hand’.  No¬ 
tice  further  that  SAND-BOARD-IN-HAND  and  SAND-BOARD-IN- VISE  weaken 
SPRAY-PAINT-SELF  because  it  deletes  their  precondition  ‘operational’.  Finally 
the  executable  modules,  PICK-UP-SPRAYER,  PICK-UP-SANDER  and  PICK- 
UP-BOARD  activate  their  successors.  This  activation  is  less  important  than  the 
backward  spreading,  because  we  want  the  impact  of  goals  (and  subgoals)  to  be 
greater  than  that  of  the  state  (and  the  ‘almost  true  propositions’). 

TIME:  2 

state  gives  . . . 

PLACE-BOARD-IN-VISE  spreads  0.0  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 
SPRAY-PAINT-SELF  spreads  73.333338  backward  to  PICK-UP-SPRAYER  for  SPRAYER-IN-HAND 
SAND-BOARD-IN-HAND  spreads  37.22222  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  spreads  37.22222  backward  to  P I CX -UP -SANDER  for  SANDER-IN-HAND 
SAND-BOARD- IN-HAND  decreases  (inhibits)  SPRAY-PAINT-SELF  with  26.687301  for  OPERATIONAL 
SAND-BOARD-IN-VISE  spreads  18.61111  backward  to  PLACE-BOARD-IN-VISE  for  BOARD-IN-VISE 
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SAND-BOARD-IH-VISE  spread*  37.22222  backward  to  PICX-UP-SANDER  for  SAHDER-IH-HAHD 
SAND-BOARD-IH-VISE  decreases  (inhibits)  SPRAT-PAIHT-SELF  with  26.687301  for  OPERATIONAL 
PICK-OP -SANDER  spreads  0.42328046  forward  to  SAND-BOARD-IN-HAND  for  SANDER- IK-HAND 
PICK-OP-SANDER  spreads  0.42328046  forward  to  SAND-BOARD-IN-VISE  for  SANDER- IN-HAND 
PICK-OP-SAND ER  spreads  1.2698413  forward  to  POT-DOVN-SANDER  for  SANDER-IN-HAND 
PICK-OP-SPRATER  spreads  0.96238096  forward  to  SPRAT-PAINT-SELF  for  SPRAYER- IN-HAND 
PICK-UP-SPRATER  spreads  1.9047619  forward  to  POT-DOWN-SPRAYER  for  SPRAYER- IN-HAND 
PICK-OP-BOARD  spreads  1.2698413  forward  to  PLACB-BOARD-IN-VISE  for  BOARD-IN-HAND 
PICK-OP-BOARD  spreads  0.42328046  forward  to  SAND-BOARD-IN-HAND  for  BOARD-IN-HAND 
PICK-OP-BOARD  spreads  1.2698413  forward  to  POT-DOWN-BOARD  for  BOARD-IN-HAND 
POT-DOWN-SPRATSR  spreads  0.0  backward  to  PICK-OP-SPRATER  for  SPRAYER-IN-HAND 
POT-DOVN-SANDER  spreads  0.0  backward  to  PICK-OP-SANDER  for  SANDER- IN-HAND 
POT-DOWN-BOARD  spreads  0.0  backward  to  PICK-OP-BOARD  for  BOARD-IN-HAND 

activation-levels  of  nodules  after  decay: 

activation-level  PLACE-BOARD-IN-VISE:  7.447046 
activation- level  SPRAT-PAINT-SELF:  36.377182 
activation-level  SAND-BOARD-IH-HAND :  28.202648 
activation-level  SAND-BOARD-IN-VISB:  28.044096 
activation-level  PICK-OP-SANDER:  37.874393 
activation-level  PICK-OP-SPRAYER:  37.468196 
activation-level  PICK -OP-BOARD :  23.931622 
activation-level  POT-DOVN-SPRATER:  0.7134894 
activation-level  POT-DOVN-SANDER:  0.4766696 
activation-level  POT-DOVN-BOARD :  0.4766696 

NO  MODULE  becoming  active 
threshold  is  lowered  to  36.46 


Again,  none  of  the  executable  modules  is  activated  enough  to  be  selected. 
At  time  3,  the  spreading  activation  patterns  remain  unchanged,  except  for  the 
amounts  of  activation  energy  that  are  given  or  taken  away  by  modules.  In  par¬ 
ticular,  PICK-UP-SPRAYER  receives  less  activation  from  its  successor  SPRAY- 
PAINT-SELF,  than  what  PICK-UP-SANDER  receives  from  SAND-BOARD-IN¬ 
HAND  and  SAND-BOARD-IN- VISE  together. 

TIME:  3 


stats  gives  . . . 

PLACE-BOARD- IN- VISE  spreads  ... 


activation-levels  of  nodules  after  decay: 


activation-level 
activation-level 
activation-level 
activation-level 
activation- level 
activation-level 


PLACE-BOARD- IN-YISE:  9.699069 
SPRAT-PAINT-SELF:  29.082869 
SAND-BOARD-IN-HAND:  27.621669 
SAND-BOARD-IN-VISE:  27.146623 
PICK-UP-SANDER:  44.079823 
PICK-UP-SPRATER:  32.721424 
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activation-level  PICK-UP-BOARS:  24.470343 
activation-level  PUT-SOVV-SPRATER:  2.4768724 
activation-level  PUT-DOWI-S AIDER:  1.6674367 
activation-laval  PUT -DOVE-BOARS :  1.1251162 

module  becoming  active:  PXCK-UP-SAESER 

The  module  PICK-UP-SANDER  now  has  accumulated  enough  activation  to  be¬ 
come  active.  As  a  result  the  state  changes,  and  thus  also  the  input  coming  from  the 
state  and  the  internal  spreading  activation  patterns.  Notice  that  SAND-BOARD- 
IN- VISE  and  SAND-BOARD-IN-HAND  now  inhibit  PUT-DOWN-SANDER  to 
prevent  it  from  undoing  the  precondition  ‘sander- in-hand’.  Notice  also  that  PICK¬ 
UP-BOARD  decreases  the  activation  level  of  PICK-UP-SPRAYER  for  the  pre¬ 
condition  ‘hand-is-empty’.  This  inhibition  will  become  stronger  in  time  because 
SAND-BOARD-IN- VISE  and  SAND-BOARD-IN-HAND  will  be  enforced  since 
now  more  of  their  preconditions  are  true. 

TIME:  4 

•tat#  of  th#  •nvironm#nt :  (S AIDER- XI-HA IS  HAND-IS-EKPTT  SPRAYER- SOMEWHERE  OPERATIONAL 

BOARS-SOMBVHERS) 

goals  of  th#  #nviromn#nt :  (BOARD-SAID ED  SSLF-PAIHTED) 
protected  goals  of  th#  environment:  IXL 


state 

state 

state 

state 

state 

state 

state 

state 

state 

state 

state 

goals 

goals 

goals 


gives  S AID-BOARD- II-HAID  an  extra  activation  of  2.2222223 
gives  SAID -BOARD -IS- VISE  an  extra  activation  of  2.2222223 
gives  PUT-DOVI-SAIDER  an  extra  activation  of  6.6666665 
gives  PICK-UP-SAID ER  an  extra  activation  of  3.3333333 
gives  PICK-UP-SPRAYER  an  extra  activation  of  3.3333333 
gives  PICK-UP-BOARD  an  extra  activation  of  3.3333333 
gives  PICK-UP-SPRAYER  an  extra  activation  of  10.0 
gives  SPRAY-PAIIT-SELF  an  extra  activation  of  3.3333333 
gives  SAID-BOARD-U-HAID  an  extra  activation  of  2.2222223 
gives  SAID-B0ARD-XI-7ISE  an  extra  activation  of  2.2222223 
gives  PICK-UP-BOARD  an  extra  activation  of  10.0 
give  SAID-BOARD -IN- HIND  an  extra  activation  of  35.0 
give  SAID-BOARD- IN- VISE  an  extra  activation  of  35.0 
give  SPRAY-PAIIT-SELF  an  extra  activation  of  70.0 


PLACE-BOARD-XI-VISE  spreads  0.690050  backward  to  PICK-UP-BOARD  for  B0ARD-II-HAID 
SPRAY-PAIIT-SELF  spreads  20.082869  backward  to  PICK-UP-SPRATER  for  SPRATER-II-HAHD 
SAID-BOARD-II-HAED  spreads  27.621550  backward  to  PICK-UP-BOARD  for  B0ARD-II-HAID 
SAID-BOARD-II-HAID  decreases  (inhibits)  PUT-DOVI-SAIDER  with  19.658257  for  S AIDER-IN-HAND 
SAID-BQARD-II-HAID  decreases  (inhibits)  SPRAY-PAIIT-SELF  with  10.658257  for  OPERATIONAL 
SAID-BOARD-IN-VISE  spreads  13.673261  backward  to  PLACE-BOARD-II-VISE  for  BOARD-IN-VISE 
SAID-BOARD-II-VISE  decreases  (inhibits)  PUT-DOVI-SAIDER  with  10.390373  for  SANDER-IN-HAND 
SAND-BOARD-II-VISE  decreases  (inhibits)  SPRAY-PAIIT-SELF  with  19.390373  for  OPERATIONAL 
PICK-UP-SAID ER  spreads  0.0  backward  to  PUT-DOVI-SAIDER  for  S AIDER -SOMEWHERE 
PICK-UP-SPRATER  spreads  2.3372447  forward  to  SPRAY-PAIIT-SELF  for  SPRAYER-IN-HAND 
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PICK-UP-SPRAYER  spreads  4.674489B  forward  to  PUT-DOVH-SPRAYER  for  SPRAYER- IH-HAHD 
PICK-UP-SPRAYER  decreasss  (inhibits)  PICK-UP-SARDER  with  E.843111S  for  HAHD-IS-EMPTY 
PICK-UP-SPRAYER  doersasss  (inhibits)  PICK-UP-BOARD  with  S.843111E  for  HAHD-IS-EMPTY 
PICK-UP-BOARD  sproads  2.3313883  forward  to  PLACE-BOARD-IH-VISE  for  BOARD-IH-HAHD 
PICK-UP-BOARD  sproads  0.7771221  forward  to  SAHD-BOARD-IH-HAHD  for  BOARD-IH-HAHD 
PICK-UP-BOARD  sproads  2.3313863  forward  to  PUT-DOWI-BOARD  for  BOARD-IH-HABD 
PICK-UP-BOARD  docroasos  (inhibits)  PICK-UP-SAHDER  with  4.3713117  for  HAHD-IS-EMPTT 
PUT-DOVH-SPRAYER  sproads  2.4768724  backward  to  PICK-UP-SPRAYER  for  SPRAYER- IH-HAHD 
PUT-DOWH-SAHDER  sproads  0.23820E2E  forward  to  PICK-UP-SAHDER  for  SAHDER-SOMEVHERE 
PUT-DOWH-BOARD  sproads  1.12S1182  backward  to  PICK-UP-BOARD  for  BOARD-IH-HAHD 

activation-lovols  of  modules  after  decay: 

activation-level  PLACE-BOARD-IH-VISE:  13.320736 
activation-level  SPRAT-PAIHT-SELP :  34.184002 
activation- level  SAHD-BOARD-IH-HAHD :  3E. 24447 
activation-level  SAHD-BOARD-IH-VISE:  34.64S04 
activation-level  PICK-UP-SAHDER:  0.12393018 
activation- level  PICK-UP-SPRAYER:  40.38021E 
activation-level  PICK-UP-BOARD:  36.S82684 
activation-level  PUT-DOVH-SPRAYER:  3.720613 
activation-level  PUT-DOVH-SAHDBR:  0.0 
activation-level  PUT-DOWH-BOARD:  1.798291 

HO  MODULE  becoming  active 
threshold  is  lowered  to  40. S 

At  time  5,  the  spreading  activation  pattern  is  similar  to  that  of  time  4.  The 
state  and  the  goals  spread  activation  to  the  same  modules.  Also  modules  keep 
spreading  activation  to  the  same  modules,  except  that  now  the  amounts  they  give 
and  take  away  have  changed  (because  the  activation  levels  of  the  modules  at  time 
4  are  different  from  those  at  time  3). 

TIME:  E 

stats  gives  . . . 

PLACE-BOARD-IH-VISE  spreads  ... 

activation-levels  of  modules  after  decay: 

activation-level  PLACE-BOARD-IH-VISE:  IE. 370311 
activation-level  SPRAY-PAIHT-SELF :  27.239319 
activation-level  SAHD-BOARD-IH-HAHD:  34.161SS2 
activation-level  SAHD-BOARD-IH-VISE:  33.388S26 
activation-level  PICK-UP-SAHDER:  0.0 
activation-level  PICK-UP-SPRAYER:  41.26312 
activation-level  PICK-UP-BOARD:  41.91644 
activation-level  PUT-DOVH-SPRAYER:  4.273766B 
activation- level  PUT-DOWH-SAHDER:  0.02790792B 
activation-level  PUT-DOWH-BOARD:  2.37907E 
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module  becoming  active:  PICK-UP-BOARD 


The  module  that  becomes  active  is  PICK-UP-BOARD.  The  state  of  the  envi¬ 
ronment  changes  by  the  actions  performed  by  this  module,  so  that  the  input  from 
the  state  and  the  internal  spreading  activation  patterns  are  different  at  time  6. 

TIME:  6 


state  of  the  environment:  (BOARD-IN-HAND  SANDER- IN-HAND  SPRATER-SOMEWHERE  OPERATIONAL) 
goals  of  the  environment :  (BOARD-SANDED  SELF-PAINTED) 
protected  goals  of  the  environment:  NIL 


state 

state 

state 

state 

state 

state 

state 

state 

state 

state 

goals 

goals 

goals 


gives  PLACE-B0ARD-IN-7ISB  an  extra  activation  of  6.8886606 
gives  SAND-BOARD-IN-BAND  an  extra  activation  of  2.2222223 
gives  PUT-DOWN-BOARD  an  extra  activation  of  6.6666666 
gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 
gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 
gives  PUT-DOWN- SANDER  an  extra  activation  of  6.6666666 
gives  PICK-UP-SPRATER  an  extra  activation  of  10.0 
gives  SPRAT-PAINT-SELF  an  extra  activation  of  3.3333333 
gives  SAND-BOARD -IH-HAND  an  extra  activation  of  2.2222223 
gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 
give  SAND-BOARD-IN-HAND  an  extra  activation  of  36.0 
give  SAND-BOARD-IN-VISE  an  extra  activation  of  36.0 
give  SPRAT -PAINT-SELF  an  extra  activation  of  70.0 


PLACE-BOARD-IN-VISE  spreads  0.7319196  forward  to  PICK-UP-SANDER  for  HAND-IS-EMPTT 
PLACE-BOARD-IN-VISE  spreads  0.7319190  forward  to  PICK-UP-SPRATER  for  HAND-IS-EMPTT 
PLACE-BOARD-IN-VISE  spreads  0.7319196  forward  to  PICK-UP-BOARD  for  HAND-IS-EMPTT 
PLACE-BOARD-IN-VISE  spreads  1.4638392  forward  to  SAND-BOARD-IN-VISE  for  BOARD-IN-VISE 
PLACE-BOARD-IN-VISE  decreases  (inhibits)  PUT -DOWN-BOARD  with  10.978794  for  BOARD-IN-HAND 
SPRAT-PAINT-SELF  spreads  27.239319  backward  to  PICK-UP-SPRATER  for  SPRATER- IH-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  PLACE-BOARD-IN-VISE  with  12.200666  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  PUT-DOWN-BOARD  with  12.200666  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  PUT-DOWH-SANDER  with  24.40111  for  SANDER- IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  SPRAT-PAINT-SELF  with  24.40111  for  OPERATIONAL 
SAND-BOARD-IN-VISE  spreads  16.684263  backward  to  PLACE-BOARD-IN-VISE  for  BOARD-IN-VISE 
SAND-BOARD-IN-VISE  decreases  (inhibits)  PUT-DOWB-SANDER  with  23.834661  for  SANDER-IN-HAND 
SAND-BOARD-IN-VISE  decreases  (inhibits)  SPRAT-PAINT-SELF  with  23.834681  for  OPERATIONAL 
PICK-UP-SANDER  spreads  0.0  backward  to  PUT-DOWH-SANDER  for  SANDER-SDMEWHERE 
PICK-UP-SANDER  spreads  0.0  backward  to  PLACE-BOARD-IN-VISE  for  HAND-IS-EMPTT 
PICK-UP-SANDER  spreads  0.0  backward  to  PUT-DOWN-SPRATER  for  HAND-IS-EMPTT 
PICK-UP-SANDER  spreads  0.0  backward  to  PUT-DOWN-SAND ER  for  HAND-IS-EMPTT 
PICK-UP-SANDER  spreads  0.0  backward  to  PUT-DOWN-BOARD  for  HAND-IS-EMPTT 
PICK-UP-SPRATER  spreads  6.16789  backward  to  PLACB-BOARD-IH-VISE  for  HAND-IS-EMPTT 
PICK-UP-SPRATER  spreads  6.16789  backward  to  PUT-DOWN-SPRATER  for  HAND-IS-EMPTT 
PICK-UP-SPRATER  spreads  6.16789  backward  to  PUT-DOWN-SAND ER  for  HAND-IS-EMPTT 
PICK-UP-SPRATER  spreads  6.16789  backward  to  PUT-DOWN-BOARD  for  HAND-IS-EMPTT 
PICK-UP-BOARD  spreads  0.0  backward  to  PUT -DOWN -BOARD  for  BOARD-SOMEWHBRB 
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PICK-UP-BOARD  spreads  0.0  backward  to  PLACE-BOARD-IH-VISE  for  HAHD-IS-EMPTT 
PICK-UP-BOARD  spreads  0.0  backward  to  PUT-DOWH-SPRATER  for  HAHD-IS-EMPTT 
PICK-UP-BOARD  spreads  0.0  backward  to  PUT-DOWH-SAHDER  for  HAED-IS-EMPTT 
PICK-UP-BOARD  spreads  0.0  backward  to  PUT-DQVH-BOARD  for  HAED-IS-EMPTT 
PUT-DOWH-SPRATER  spreads  4.2737686  backward  to  PICK-UP-SPRATER  for  SPRATER- XH-HAND 
PUT-DOWH-SAEDER  spreads  0.0039868467  forward  to  PICX-UP-SABDER  for  SAHDER-SOMEVHERE 
PUT-DOWH-SAEDER  spreads  0.0013289489  forward  to  PICK-UP-SAHDER  for  HAED-IS-EMPTT 
PUT-DOWH-SAHDBR  spreads  0.0013289489  forward  to  PICK-UP-SPRATER  for  HAED-IS-EMPTT 
PUT-DOWH-SAHDER  spreads  0.0013289489  forward  to  PICK-UP-BOARD  for  HAED-IS-EMPTT 
PUT-DOWH-BOARD  spreads  0.3398679  forward  to  PICK-UP-BOARD  for  BOARD-SOMEWHERE 
PUT-DOWH-BOARD  spreads  0.1132893  forward  to  PICK-UP-SAHDER  for  HAED-IS-EMPTT 
PUT-DOWH-BOARD  spreads  0.1132893  forward  to  PICK-UP-SPRATER  for  HAHD-IS-EMPTT 
PUT-DOWH-BOARD  spreads  0.1132893  forward  to  PICK-UP-BOARD  for  HAED-IS-EMPTT 


activation-levels  of  modules  after  decay: 

activation-level  PLACE-BO ARD-1H-TISE :  18.660386 
activation-level  SPRAT-PAIHT-SELF :  30.829237 
activation-level  SAHD-BOARD-IH-HAHD :  44.666897 
activation-level  SAHD-BOARD-IH-VISE:  43.763033 
activation-level  PICK-UP-SAHDER:  0.60100476 
activation-level  PICK-UP-SPRATER:  49.26829 
activation-level  PICK -UP-BOARD :  0.6988667 
activation-level  PUT-DOWH-SPRATER:  6.6667623 
activation-level  PUT-DOWH-SAHDER:  3.0382743 
activation-level  PUT-DOWH-BOARD:  3.0382743 

HO  MODULE  becoming  active 
threshold  is  lowered  to  40.6 


Again  the  spreading  activation  patterns  at  time  7  are  like  those  at  time  6. 
In  particular  SAND-BOARD-IN-HAND  will  now  have  received  enough  activation 
from  the  state  and  the  goals  to  become  active.  Notice  that  although  PICK-UP- 
SPRAYER  has  a  very  high  activation  level,  it  does  not  become  active  because  not 
all  of  its  preconditions  are  fulfilled. 

TIME:  7 

stats  gives  . . . 

PLACE-BOARD-IH-VISE  spreads  ... 


activation- levels  of  modules  after  decay: 

activation-level  PLACE-BOARD-IH-VISE :  19.967624 
activation-level  SPRAT-PAIHT-SELF:  21.800142 
activation-level  SAHD-BOARD-IH-HAHD:  46.89836 
activation-level  SAHD-BOARD-IH-VISE:  46.176903 
activation-level  PICK-UP-SAHDER:  1.1233612 
activation-level  PICK-UP-SPRATER:  61.47401 
activation-level  PICK-UP-BOARD:  1.2286371 


activation-level  POT-DOWN-SPRATER:  0.3O68S33 
activation-level  PUT-DOWN-SANDER:  3.486372 
activation-level  POT-DOWN-BOARD :  3.6389647 


module  becoming  active:  SAND-BOARD -13-HAND 

As  a  consequence  the  state  and  goals  change.  The  only  remaining  goal  to 
be  achieved  is  ‘self-painted’.  In  order  to  do  so,  the  robot  has  to  free  at  least 
one  hand.  Notice  that  PICK-UP-SPRAYER  spreads  backwards  to  the  modules 
that  can  achieve  this,  i.e.,  PLACE-BOARD-IN- VISE,  PUT-DOWN-SANDER  and 
PUT-DOWN-BOARD. 

TIME:  8 

state  of  the  environment:  (BOARD-SANDED  BOARD-IN -HAND  SANDER- IN-HAND  SPRATER-SOMEVHERE 

OPERATIONAL) 

goals  of  the  environment:  (SELF-PAINTED) 
protected  goals  of  the  environment:  (BOARD-SANDED) 

state  gives  PLACB-BOARD-IN-VISE  an  extra  activation  of  6.666666S 
state  gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 
state  gives  PUT-DOWH-BOARD  an  extra  activation  of  6.6666666 
state  gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 
state  gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 
state  gives  PUT-DOWN-SANDER  an  extra  activation  of  6.6666666 
state  gives  PICK-UP-SPRATER  an  extra  activation  of  10.0 
state  gives  SPRAT-PAIHT-SELF  an  extra  activation  of  3.3333333 
state  gives  SAND -BOARD -IN -HAND  an  extra  activation  of  2.2222223 
state  gives  SAND -BOARD- IN- VISE  an  extra  activation  of  2.2222223 
goals  give  SPRAT-PAINT-SELF  an  extra  activation  of  70.0 

PLACE-BOARD-IN-VISE  spreads  0.9608346  forward  to  PICK-OP-SANDER  for  HAND-IS-EMPTT 
PLACE-BOARD-IN-VISE  spreads  0.9608346  forward  to  PICK-OP-SPRATER  for  HAND-IS-EMPTT 
PLACE-BOARD-IN-VISE  spreads  0.9608346  forward  to  PICK-OP-BOARD  for  HAND-IS-EMPTT 
PLACB-BOARD-IN-VISB  spreads  1.901669  forward  to  SAND-BOARD-IN-VISE  for  BOARD-IN-VISE 
PLACE-BOARD-IN-VISE  decreases  (inhibits)  PUT-D0WN-B0ARD  with  14.262617  for  BOARD-IN-HAND 
SPRAT-PAINT-SELF  spreads  21.800142  backward  to  PICK-OP-SPRATER  for  SPRATER- IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  PLACE-BOARD-IN-VISE  with  0.0  for  BOARD-IN-HAND 
SAND-BOARD- IN-HAND  decreases  (inhibits)  POT -DOWN-BOARD  with  0.0  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  POT-DOVN-SANDER  with  0.0  for  SANDER-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  SPRAT-PAINT-SELF  with  0.0  for  OPERATIONAL 
SAND-BOARD-IN-VISE  spreads  22.687962  backward  to  PLACE-BOARD-IN-VISE  for  BOARD-IN-VISE 
SAND-BOARD- IN-VISB  decreases  (inhibits)  PUT-DOWN-SANDER  with  32.2686  for  SANDER-IN-HAND 
SAND-BOARD-IN-VISE  decreases  (inhibits)  SPRAT-PAINT-SELF  with  32.2686  for  OPERATIONAL 
PICK-OP-SANDER  spreads  0.6616766  backward  to  POT-DOWN-SANDER  for  SANDER-SOMEWBERE 
PICK-UP-SAND ER  spreads  0.1404189  backward  to  PLACE-BOARD-IN-VISE  for  HAND-IS-EMPTT 
PICK-OP-SANDER  spreads  0.1404189  backward  to  POT-DOWN-SPRATER  for  HAND-IS-EMPTT 
PICK-OP-SANDER  spreads  0.1404189  backward  to  POT-DOWN-SANDER  for  HAND-IS-EMPTT 
PICK-OP-SANDER  spreads  0.1404189  backward  to  POT-DOWN-BOARD  for  HAND-IS-EMPTT 
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PICK-UP-SPRATER  spreads  6.4342613  backward  to  PLACE-BOARD- 15-VISE  for  HAHD-IS-EMPTT 
PICK-OP-SPRATER  spreads  6.4342613  backward  to  PUT-DOWH-SPRATER  for  HAHD-IS-EMPTT 
PICK-UP-SPRATER  spreads  0.4342613  backward  to  POT-DOWH-SAHDER  for  HAHD-IS-EMPTT 
PICK-OP-SPRAYER  spreads  0.4342613  backward  to  POT-DOVH-BOARD  for  HAHD-IS-EMPTT 
PICK-OP-BOARD  spreads  0.61426864  backward  to  POT-DOVH-BOARD  for  BOARD-SOKE WHERE 
PICK-OP-BOARD  spreads  0.16366714  backward  to  PLACE-BOARD- IH-VISE  for  HAHD-IS-EMPTT 
PICK-OP-BOARD  spreads  0.16366714  backward  to  POT-DOVH-SPRAYER  for  HAHD-IS-EMPTT 
PICK-OP-BOARD  spreads  0.16366714  backward  to  POT-DOWH-SAHDER  for  HAHD-IS-EMPTT 
PICK-OP-BOARD  spreads  0.16356714  backward  to  POT-DOVH-BOARD  for  HAHD-IS-EMPTT 
POT-DOVH-SPRAYER  spreads  6.3068633  backward  to  PICK-OP-SPRATER  for  SPRAYER- IH -HAHD 
POT-DOWH-SAHDER  spreads  0.49806316  forward  to  PICK-OP-SAHDER  for  SAHDER-SOMEVHERE 
POT-DOWH-SAHDER  spreads  0.16601773  forward  to  PICK-OP-SAHDER  for  HAHD-IS-EMPTT 
POT-DOWH-SAHDER  spreads  0.16601773  forward  to  PICK-OP-SPRATER  for  HAHD-IS-EMPTT 
POT-DOWH-SAHDER  spreads  0.18601773  forward  to  PICK -OP-BOARD  for  HAHD-IS-EMPTT 
POT-DOVH-BOARD  spreads  0.5066664  forward  to  PICK-OP-BOARD  for  BOARD-SOMEWHERE 
POT-DOVH-BOARD  spreads  0.16862213  forward  to  PICK-OP-SAHDER  for  HAHD-IS-EMPTT 
POT-DOWH-BOARD  spreads  0.10852213  forward  to  PICK-OP-SPRATER  for  HAHD-IS-EMPTT 
POT-DOWH-BOARD  spreads  0.16862213  forward  to  PICK-OP-BOARD  for  HAHD-IS-EMPTT 

activation- levels  of  modules  after  decay: 

activation-level  PLACE-BOARD- IH-VISE:  37.119087 
activation-level  SPRAY-PAIHT-SELF :  41.70643 
activation-level  SAHD-BOARD-IH-HAHD :  4.422858 
activation-level  SAHD-BOARD-IH-TISE :  34.181183 
activation-level  PICK-OP-SAHDBR:  1.9284406 
activation-level  PICK-OP-SPRATER:  60.28337 
activation-level  PICK-OP-BOARD :  2.0032084 
activation-level  POT-DOVH-SPRATER:  8.647864 
activation-level  POT-DOWH-SAHDER:  4.8363376 
activation-level  POT-DOVH-BOARD:  4.8712296 

HO  MODOLE  becoming  active 
threshold  is  lowered  to  40.6 

At  time  9  till  17  the  activation  patterns  remain  the  same.  SPRAY-PAINT- 
SELF  accumulates  activation  coming  from  the  goals  and  spreads  this  activation 
further  towards  its  only  predecessor,  namely  PICK-UP-SPRAYER.  PICK-UP- 
SPRAYER  spreads  the  received  activation  further  backwards  towards  the  modules 
that  can  make  its  precondition  ‘hand-is-empty’  true.  Because  there  are  many  such 
modules,  it  takes  some  time  before  one  of  them  is  selected. 

TIME:  17 

•tato  givaa  . . . 

PLACE-BOARD- IH-VISE  spreads  ... 

activation-lsvals  of  modules  after  decay: 

activation-level  PLACB-BOARD-IH-VISE:  17.6025 
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activation-level 

activation-level 

activation-level 

aetivation-laval 

activation-laval 

aetivation-laval 

activation-laval 

activation-laval 

activation-laval 


SPRAY-PAINT-SELF :  61.41764 
SAND-BOARD-IN-HAND :  6.4296135 
SAND-BOARD-IN-VISB :  6.108067 
PICK-UP-SANDER:  2.B221777 
PICK-UP-SPRAYER:  79.323494 
PICK-UP-BOARD:  2.2743216 
PUT -DOWN-SPRAYER:  10.060002 
PUT-DOVN-SANDER:  8.496746 
PUT-DOWN-BOARD:  6.70ES316 


module  bacoming  activa:  PLACE-BOARD-IN-VISE 

Finally  PLACE-BOARD-IN-VISE  becomes  active,  and  makes  one  hand  free. 
As  a  result  PICK-UP-SPRAYER  (which  had  already  accumulated  enough  activa¬ 
tion)  is  executable. 

TIME:  18 


atata  of  tha  anvironmant :  (HAND-IS-EMPTT  BOARD-IN-VISE  BOARD-SANDED  SANDER- IN-BAND 

SPRAYER-SOMEVHERE  OPERATIONAL) 
goal*  of  tha  anvironmant:  (SELF-PAINTED) 
protactad  goals  of  tha  anvironmant :  (BOARD-SANDED) 


atata 

atata 

atata 

atata 

atata 

atata 

atata 

atata 

atata 

atata 

atata 

goals 


givaa  PICK-UP- SANDER  an  axtra  activation  of  3.3333333 
givas  PICK-UP-SPRAYER  an  axtra  activation  of  3.3333333 
givaa  PICX -UP-BOARD  an  axtra  activation  of  3.3333333 
givaa  SAND-BOARD-IN-VISE  an  axtra  activation  of  6.666666B 
givaa  SAND-BOARD-IN-HAND  an  axtra  activation  of  2.2222223 
givas  SAND -BOARD -IN- VISE  an  axtra  activation  of  2.2222223 
givas  PUT-DOVN-SAHDER  an  axtra  activation  of  6.666666B 
givaa  PICK-UP-SPRAYER  an  axtra  activation  of  10.0 
givaa  SPRAT-PAINT-SELF  an  axtra  activation  of  3.3333333 
givaa  SAND-BOARD-IN-HAND  an  axtra  activation  of  2.2222223 
givas  SAND-BOARD-IN-VISE  an  axtra  activation  of  2.2222223 
giva  SPRAY-PAINT-SELF  an  axtra  activation  of  70.0 


PLACE-BOARD-IN-VISE  spreads  0.0  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 
SPRAT-PAINT-SELF  spraada  61.41764  backward  to  PICK-UP-SPRAYER  for  SPRATER-IN-HAND 
SAND-BOARD-IN-HAND  spreads  6.429B13B  backward  to  PICK -UP-BOARD  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  dacraaaas  (inhibits)  PUT-DOVN-SANDER  with  4.B92B097  for  SANDER-IN-HAND 
SAND-BOARD-IN-HAND  dacraaaas  (inhibits)  SPRAT-PAINT-SELF  with  4.B92B097  for  OPERATIONAL 
SAND-BOARD-IN-VISE  dacraaaas  (inhibits)  PUT-DOVN-SANDER  with  4.36290B  for  SANDER-IN-HAND 
SAND-BOARD- IN- VISE  dacraaaas  (inhibits)  SPRAT-PAIHT-SELF  with  4.36290B  for  OPERATIONAL 
PICK-UP-SANDER  spreads  1.2610888  backward  to  PUT-DOVN-SANDER  for  SANDER-SOMEVHERE 
PICK-UP-SAND ER  dacraaaas  (inhibits)  PICK-UP-BOARD  with  0.4S038888  for  HAND-IS-EMPTY 
PICK-UP-SPRAYER  spreads  S. 666964  forward  to  SPRAT-PAINT-SELF  for  SPRATER-IN-HAND 
PICK -UP-SPRAYER  spreads  11.331928  forward  to  PUT-DOVN-SPRATER  for  SPRATER-IN-HAND 
PICK-UP-SPRAYER  dacraaaas  (inhibits)  PICK-UP-SANDER  with  14.16491  for  HAND-IS-EMPTY 
PICK-UP-SPRAYER  dacraasas  (inhibits)  PICK-UP-BOARD  with  14.16491  for  HAND-IS-EMPTY 
PICK-UP-BOARD  spreads  1.1371608  backward  to  PUT-D0WN-B0ARD  for  B0ARD-S0MEWHERE 
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PUT-DOWN-SPRAYER  spreads  10.080002  backward  to  PICK-UP-SPRAYER  for  SPRAYER- IN-HAND 
PUT-DOWN-SANDER  spreads  1.2138209  forward  to  PICK-UP-SANDER  for  SAND ER-SOMEWH ERE 
PUT-DOWN-BOARD  spreads  5.7066316  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 

activation-levels  of  modules  after  decay: 
activation-level  PLACE-BOARD-IN- VISE:  0.0 
activation-level  SPRAY-PAINT-SELF:  71.77S67 
activation-level  SAND-BOARD-IN-HAND :  6. 936989 
activation-level  SAND-BOARD-IN-VISE:  9.401387 
activation-level  PICK-UP-SANDER:  0.6627248 
activation-level  PICK-UP-SPRAYER:  89.61462 
activation-level  PICK-UP-BOARD:  3.1161197 
activation-level  PUT-DOWN-SPRAYER:  11.679616 
activation-level  PUT-DOWN-SANDER:  4.077989 
activation-level  PUT-DOWN-BOARD:  3.7369893 

module  becoming  active:  PICK-UP-SPRAYER 

And  finally,  the  module  SPRAY-PAINT-SELF  (which  also  already  had  accu¬ 
mulated  enough  activation)  becomes  executable  and  is  selected. 


TIME:  19 

state  of  the  environment:  (SPRATER-IN-B*--  . jARD-IN-VISE  BOARD-SANDED  SANDER- IN-HAND  0PERATI01 
goals  of  the  environment:  (SELF-PA IN'*  ID)  JU 

protected  goals  of  the  environment:  (BOARD-SANDED)  V 

state  gives  SPRAY-PAINT-SELF  an  extra  activation  of  6.0 
state  gives  PUT-DOWN-SPRAYER  an  extra  activation  of  10.0 
state  gives  SAND-BOARD-IN-VISE  an  extra  activation  of  6.6666666 

state  gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 

state  gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 

state  gives  PUT-DOWN-SANDER  an  extra  activation  of  6.6666666 
state  gives  SPRAY-PAINT-SELF  an  extra  activation  of  3.3333333 
state  gives  SAND-BOARD-IN-HAND  an  extra  activation  of  2.2222223 

state  gives  SAND-BOARD-IN-VISE  an  extra  activation  of  2.2222223 

goals  give  SPRAT-PAINT-SELF  an  extra  activation  of  70.0 

PLACE-BOARD-IN-VISB  spreads  0.0  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 
SPRAT-PAINT-SELF  decreases  (inhibits)  PUT-DOWN-SPRAYER  with  61.268337  for  SPRAYER-IN-HAND 
SAND-BOARD-IN-HAND  spreads  6.936989  backward  to  PICK-UP-BOARD  for  BOARD-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  PUT-DOWN-SANDER  with  4.2407064  for  SANDER-IN-HAND 
SAND-BOARD-IN-HAND  decreases  (inhibits)  SPRAT-PAINT-SELF  with  4.2407064  for  OPERATIONAL 
SAND-BOARD-IN-VISE  decreases  (inhibits)  PUT-DOWN-SANDER  with  6.7162624  for  SANDER-IN-HAND 
SAND-BOARD-IN-VISE  decreases  (inhibits)  SPRAY-PAINT-SELF  with  6.7162624  for  OPERATIONAL 
PICK-UP-SANDER  spreads  0.3313624  backward  to  PUT-DOWN-SANDER  for  SANDER-SOMEWHERE 
PICX-UP-SANDER  spreads  0.0828406  backward  to  PLACE-BOARD- IN- VISE  for  HAND-IS-EMPTY 
PICK-UP-SANDER  spreads  0.0828406  backward  to  PUT-DOWN-SPRAYER  for  HAND-IS-EMPTY 
PICK-UP-SANDER  spreads  0.0828406  backward  to  PUT-DOWN-SANDER  for  HAND-IS-EMPTY 
PICK-UP-SANDER  spreads  0.0828406  backward  to  PUT-DOWN-BOARD  for  HAND-IS-EMPTY 


PICK-UP-SPRAYER  spreads  0.0  backward  to  PUT-DOWH-SPRAYER  for  SPRAYER-SOMEWHERE 
PICK-UP-SPRAYER  apraads  0.0  backward  to  PLACE-BOARD-IH-VISE  for  HAHD-IS-EMPTY 
PICK-UP-SPRAYER  apraads  0.0  backward  to  PUT-DOWH-SPRAYER  for  HAHD-IS-EMPTY 
PICK-UP-SPRAYER  apraada  0.0  backward  to  PUT-DOWH-SAHDER  for  HAHD-IS-EMPTY 
PICK-UP-SPRAYER  apraada  0.0  backward  to  PUT-D0WH-B0ARD  for  HAHD-IS-EMPTY 
PICK-UP-BOARD  apraads  1.6S75SS8  backward  to  PUT-DOVH-BOARD  for  BOARD-SOMEVBERE 
PICK-UP-BOARD  apraads  0.38938996  backward  to  PLACE-BOARD-IH-YISE  for  HAHD-IS-EMPTY 
PICK-UP-BOARD  apraada  0.38938996  backward  to  PUT-DOWH-SPRAYER  for  HAHD-IS-EMPTY 
PICK-UP-BOARD  apraads  0.38938996  backward  to  PUT-DOWH-SAHDER  for  HAHD-IS-EMPTY 
PICK-UP-BOARD  apraads  0.38938996  backward  to  PUT-DOWH-BOARD  for  HAHD-IS-EMPTY 
PUT-DOWH-SPRAYER  apraads  1.6688166  forward  to  PICK-UP-SPRAYER  for  SPRAYER-SOMEWHERE 
PUT-DOWH-SPRAYER  apraads  0.5861722  forward  to  PICK-UP-SAHDER  for  HAHD-IS-EMPTY 
PUT-DOWH-SPRAYER  apraads  0.6661722  forward  to  PICK-UP-SPRAYER  for  HAHD-IS-EMPTY 
PUT-DOWH-SPRAYER  apraads  0.6661722  forward  to  PICK-UP-BOARD  for  HAHD-IS-EMPTY 
PUT-DOWH-SAHDER  spreads  0.5826699  forward  to  PICK-UP-SAHDER  for  SAHDER-SOMEWHERE 
PUT-DOWH-SAHDER  spreads  0.19418997  forward  to  PICK-UP-SAHDER  for  HAHD-IS-EMPTY 
PUT-DOWH-SAHDER  spreads  0.19418997  forward  to  PICK-UP-SPRAYER  for  HAHD-IS-EMPTY 
PUT-DOWH-SAHDER  apraada  0.19418997  forward  to  PICK-UP-BOARD  for  HAHD-IS-EMPTY 
PUT-DOWH-BOARD  apraads  3.7369893  backward  to  PICK-UP-BOARD  for  BOARD-IH-HAHD 

activation-levels  of  modules  after  decaf: 

activation-laval  PLACE-BOARD-IH-VISE:  0.47223058 
activation-level  SPRAY-PAIHT-SELP:  139.15306 
activation-laval  SAHD-BOARD-IH-HAHD :  10.3814336 
activation-laval  SAHD-BOARD-IH-VISE :  20.612478 
activation-laval  PICK-UP-SAHDER:  1.996667 
activation-laval  PICK-UP-SPRATER:  2.4188788 
activation-laval  PICK-UP-BOARD:  13.538461 
activation-laval  PUT-DOWH-SPRAYER:  0.47223065 
activation-laval  PUT-DOWH-SAHDER:  0.8035929 
activation-laval  PUT-DOWH-BOARD:  6.76578 

module  becoming  active:  SPRAY-PAIHT-SELF 


5  Results 

The  algorithm  presented  in  this  paper  can  be  modeled  by  a  system  of  differential 
equations.  This  system  is  however  too  complicated  to  solve,  so  that  exact  predic¬ 
tions  about  the  resulting  action  selection  behavior  are  not  possible.  Nevertheless, 
important  qualitative  results  can  be  ootained,  for  example  on  possible  phase  tran¬ 
sitions  with  the  growth  of  parameters,  such  as  the  size  of  the  network,  the  mean 
fanout  of  a  node,  etc  (Huberman  &  Hogg,  1987).  We  have  evaluated  the  algorithm 
empirically  by  performing  a  wide  series  of  experiments  using  several  example  ap¬ 
plications.  The  networks  had  such  diverse  properties  as  being  very  ‘wide’,  very 
‘long’,  containing  cycles,  local  high  concentrations  of  links,  unlinked  subnetworks, 
destructive  modules,  conflicting  and  mutually  conflicting  modules,  etc.  All  of  the 
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problems  presented  were  solved  for  large  ranges  of  parameters. 

The  simulated  societies  cannot  be  said  to  show  a  ‘jump-first  think-never’  be¬ 
havior.  They  do  exhibit  planning  capabilities.  They  ‘consider’  to  some  extent  the 
effects  of  a  sequence  of  actions  before  actually  embarking  on  its  execution.  If  a 
sequence  of  competence  modules  exists  that  transforms  the  current  situation  into 
the  goal  state,  then  this  sequence  becomes  highly  activated  through  the  cumulative 
effect  of  the  forward  spreading  (starting  from  the  current  state)  and  the  backward 
spreading  (starting  from  the  goals).  If  this  sequence  potentially  implies  negative 
effects,  it  is  weakened  by  the  inhibition  rules. 

More  specifically,  goal- relevance  of  the  selected  action  is  obtained  through  the 
input  from  the  goals  and  the  backward  spreading  of  activation.  Situation  relevance 
and  opportunistic  behavior  are  obtained  through  the  input  of  the  state  and  the 
spreading  of  activation  forward.  Conflicting  and  interacting  goals  are  taken  into 
account  through  inhibition  by  the  protected  goals  and  inhibition  among  conflicting 
modules.  Further,  local  maxima  in  the  action  selection  are  avoided,  provided  that 
the  spreading  of  activation  can  go  on  long  enough  (the  threshold  is  high  enough), 
so  that  the  network  can  evolve  towards  the  optimal  activity  pattern.  And  finally, 
the  algorithm  automatically  biases  towards  ongoing  plans,  because  these  tend  to 
have  a  shorter  distance  between  state  and  goals  and  are  favored  by  the  remains  of 
the  past  spreading  activation  patterns.  Moreover,  the  global  parameters  serve  as 
controls  by  which  one  can  mediate  smoothly  among  these  different  action  selection 
characteristics. 

The  notion  of  a  plan  is  here  very  different  from  the  classical  one  existing  in 
AI.  A  network  does  not  construct  an  explicit  representation  of  a  single  plan,  but 
instead  expresses  its  ‘intention’  or  ‘urge’  to  take  certain  actions  by  high  activation 
levels  of  the  corresponding  modules.  Another  important  difference  is  that  there  is 
no  centralized  preprogrammed  search  process.  Instead,  the  operators  (competence 
modules)  themselves  select  the  sequence  of  operators  that  are  activated,  and  this 
in  a  non-hierarchical,  highly  distributed  way.  There  is  no  search  tree  constructed, 
i.e.,  there  is  no  explicit  representation  built  of  state  changes  after  taking  certain 
actions. 

Consequently,  the  system  does  not  suffer  from  the  disadvantages  of  search  trees 
such  as:  that  information  is  duplicated  in  several  parts  of  a  tree;  trees  grow  ex¬ 
ponentially  with  the  size  of  the  problem;  trees  only  make  a  strict  representation 
of  plans  possible  (impossible  to  work  with  uncertainties);  etc.  In  addition,  the 
spreading  activation  process  is  a  much  cheaper  operation.  Of  course  these  advan¬ 
tages  are  not  cost-free.  The  action  selection  produced  is  less  ‘rational’  than  that 
of  the  sophisticated  deliberative  planners  built  in  AI.  On  the  other  hand  the  latter 
systems,  when  applied  in  autonomous  agents,  suffer  from  brittleness  and  slowness. 
What  is  particularly  interesting  about  the  algorithm  presented  here  is  that  it  pro¬ 
vides  parameters  to  mediate  between  adaptivity,  speed  and  reactivity  on  the  one 


hand  and  thoughtfulness  and  rationality  on  the  other  hand. 

The  following  subsections  discuss  the  results  observed  in  detail. 


5.1  Goal-Orientedness 

The  algorithm  selects  actions  that  contribute  to  the  globed  goals  of  the  agent.  Given 
that  g  is  a  global  goal  of  the  network,  then  7  of  new  activation  energy  is  put  into 
the  modules  that  achieve  this  goal.  These  modules  will  in  turn  per  subgoal  (false 
precondition)  increase  the  activation  level  of  the  modules  that  make  this  subgoal 
true,  and  so  on.  This  backward  spreading  of  activation  takes  care  that  modules 
that  contribute  to  goal  g  are  more  activated  than  modules  that  don’t.  Furthermore 
modules  that  contribute  to  different  goals  (or  subgoals)  receive  activation  for  each 
of  these  goals  and  will  therefore  be  favored  over  modules  that  only  contribute  to 
one. 

If  the  agent  has  more  than  one  goal,  modules  that  contribute  to  the  goal  that 
is  ‘closest’  are  favored.  ‘Closest’  here  means  that  the  path  from  the  goal- achieving 
modules  to  the  state-matching  modules  is  the  shortest.  The  algorithm  also  favors 
modules  that  have  little  competition.  For  example,  if  the  agent  has  two  goals  g  1 
and  g2  and  if  there  is  one  module  that  achieves  gl  and  there  are  two  modules  that 
achieve  g2  then  the  algorithm  favors  the  module  that  achieves  gl,  and  therefore 
the  probability  of  gl  being  realized  first  is  higher.  All  of  these  comments  hold  for 
subgoals  as  well  as  for  goals,  since  subgoals  (false  preconditions  of  modules)  are 
treated  the  same  way  as  goals. 

The  behavior  can  be  made  more  or  less  goal-oriented  in  its  selection  by  vary¬ 
ing  the  ratio  of  7  to  <j>  (the  amount  of  activation  energy  injected  by  the  state  per 
true  proposition).  For  example,  if  <f>  =  0,  traditional  backward  chaining  is  per¬ 
formed  (i.e.,  the  selection  is  completely  goal-oriented).  On  the  other  hand,  the 
system  now  takes  less  advantage  of  opportunities,  it  is  less  reactive  and  less  biased 
by  what  is  currently  observed  and  what  is  predicted  to  become  true  in  the  near 
future.  Furthermore,  it  is  also  slowed  down  because  the  current  state  of  the  envi¬ 
ronment  does  not  bias  the  action  selection.  Ideally  we  want  a  system  that  is  mainly 
goal-oriented,  but  does  take  advantage  of  interesting  opportunities.  This  can  be 
obtained  by  choosing  7  >  <f>.  The  optimal  ratio  is  of  course  problem  dependent 
(more  on  choosing  the  parameter  values  in  section  6.4). 

5.2  Situation  Relevance 

The  algorithm  activates  the  modules  that  are  relevant  to  the  current  situation 
more  than  the  ones  that  are  not.  The  processes  responsible  for  this  are  the  input 
of  activation  energy  coming  from  the  state  of  the  environment  and  the  spread¬ 
ing  of  activation  energy  by  executable  modules  towards  their  successors  (which 
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implements  some  sort  of  prediction  of  what  will  be  true  next).  As  already  men¬ 
tioned  in  the  previous  section,  the  advantages  are  that  (1)  the  system  biases  its 
search  and  thereby  speeds  up  the  action  selection  and  (2)  the  system  is  able  to 
exploit  opportunities  (let  its  action  selection  be  driven  more  by  what  is  happening 
in  the  environment).  The  importance  of  (2)  for  an  autonomous  agent  has  recently 
been  recognized  by  the  AI  community  as  is  witnessed  by  the  growth  of  interest 
in  so-called  reactive  systems.  The  characteristic  of  situation-orientedness  can  be 
exploited  to  a  higher  or  lesser  degree  by  varying  the  parameter  <j>.  Figure  4  shows 
the  results  of  experiments  with  different  ratios  for  the  parameters  7  and  <f>. 

The  forward  spreading  rules  take  care  that  a  module  receives  activation  from 
the  state  in  proportion  to  how  ‘close’  it  is  to  being  executable  given  the  current 
state  of  the  environment.  A  module  is  closest  to  being  executable  if  it  really  is 
executable  (i.e.,  if  all  its  preconditions  are  fulfilled).  For  non-executable  modules, 
‘closeness’  is  inversely  proportional  to  the  weighted  sum  of  the  lengths  of  a  path 
from  executable  modules  to  the  module  itself  for  each  of  the  preconditions  of  the 
module.  This  implies  for  example,  that  a  module  that  has  two  preconditions  pi 
and  p2  of  which  one,  for  example  pi,  cannot  be  made  true  given  the  current  state, 
receives  relatively  less  activation  from  the  state  and,  therefore,  has  less  probability 
of  being  part  of  a  ‘plan’4. 

5.3  Adaptivity 

The  action  selection  process  is  completely  ‘open’.  The  environment  as  well  as  the 
goals  may  change  at  run  time.  As  a  result,  the  external  input/output  as  well  as  the 
internal  activation/inhibition  patterns  will  change  reflecting  the  modified  situation. 
Even  more,  the  external  influence  during  ‘planning’  or  spreading  activation  is  so 
important  that  plans  are  only  formed  as  long  as  the  influence  or  input/output  (or 
‘disturbance’)  from  the  environment  and  goals  is  present. 

Because  of  this  continuous  ‘reevaluation’,  the  action  selection  behavior  adapts 
easily  to  unforeseen  or  changing  situations.  For  example,  if  after  the  activation 
of  module  ‘pick-up-board’,  the  board  is  not  in  the  robot’s  hand  (e.g.  because  it 
slipped  away),  the  same  competence  module  becomes  active  once  more,  because  it 
still  receives  a  lot  of  activation  from  the  competence  modules  that  want  the  board 
to  be  in  the  robot’s  hand.  Or  if  there  would  be  a  second  module  which  can  make 
that  condition  become  true,  than  that  one  will  be  tried  (because  ‘pick-up-board’s 
activation  level  will  have  been  reset  to  0).  Serendipity  is  another  example  of  this 
ability  to  adapt.  If  a  goal  or  subgoal  would  suddenly  appear  to  be  fulfilled,  the 
modules  that  contributed  to  this  goal  will  no  longer  be  activated.  All  of  these 
experiments  have  been  simulated  with  success.  Notice  that  such  unforeseen  events 

4It  may  however  receive  a  lot  of  activation  from  the  goals  and  use  that  activation  to  urge  its 
predecessors  to  make  its  preconditions  true. 
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Figure  4:  These  results  show  that  one  can  mediate  between  goal-orientedness  of  the 
action  selection  and  data-orientedness  by  varying  the  ratio  of  7  to  0.  In  the  first 
experiment,  the  network  performs  traditional  backward  chaining  (0  =  0).  In  the 
second  experiment  there  is  some  forward  spreading  going  on,  but  0  is  still  smaller 
than  7  .  The  input  from  the  state  and  forward  spreading  bias  the  search  so  that 
the  action  selection  is  now  much  faster.  The  resulting  action  selection  is  however 
less  optimal  (the  action  selection  is  more  data-driven,  which  makes  that  actions 
that  are  not  relevant  to  the  goal  may  get  selected,  e.g.  in  this  case,  ‘find-place’  is 
activated  a  second  time). 
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a  —•'one  — ►b  —►two  — —►three  — ►d  —►four  — ►e  —►five  — ►f 


x  — ►  one'  — ►  y  — ►  two’  — ►  z  — ►  three’  — ►  w 


Figure  5:  A  toy  network  to  test  adaptivity  versus  bias  (inertia),  y  —*  three  stands 
for  proposition  y  is  a  precondition  of  module  three ,  while  three  — »  y  stands  for 
proposition  y  is  in  the  add-list  of  module  three. 

do  not  mean  that  the  system  has  to  ‘drop’  the  ongoing  plan  and  ‘build’  a  new  one. 
Actually  the  system  continuously  compares  the  different  alternatives.  When  some 
condition  changes,  this  may  have  the  effect  that  an  alternative  (sub-)plan  becomes 
more  attractive  (more  activated)  than  the  current  one. 

Notice  also  that  it  is  not  the  case  that  the  system  replans  at  every  timestep. 
The  ‘history’  of  the  spreading  activation  also  plays  a  role  in  the  action  selection 
behavior  since  the  activation  levels  are  not  reinitialized  at  every  timestep.  So  just 
like  there  is  a  tradeoff  between  goal-orientedness  and  state-orientedness,  we  here 
have  a  tradeoff  between  adaptivity  and  bias  towards  the  ongoing  plan  (see  also 
next  section).  One  can  smoothly  mediate  among  the  two  extremes  by  selecting  a 
particular  ratio  of  the  parameters  7  and  <f>  versus  tc  (the  mean  level  of  activation). 

Consider  as  an  example  the  modules  of  figure  5.  The  initial  state  is  (a,®),  the 
goal  is  /.  After  module  ‘one’  had  been  active,  we  added  w  to  the  global  goals. 
When  7  and  <f>  are  relatively  small  in  comparison  with  ic,  the  internal  spreading 
activation  has  more  impact  than  the  influence  from  the  state  of  the  environment 
and  the  global  goals.  The  resulting  action  selection  behavior  is  therefore  less 
adaptive.  Concretely  in  this  example  it  means  that,  although  for  goal  w  the  path 
from  state  to  goals  is  shorter,  the  system  continues  working  on  goal  /,  and  only 
after  /  is  achieved,  start  working  on  goal  w  (cfr.  figure  6).  Again  the  appropriate 
solution  lies  somewhere  in  the  middle.  The  parameters  should  be  chosen  such  that 
the  system  does  not  jump  between  different  goals  all  the  time,  but  that  it  does 
exploit  opportunities  and  adapts  to  changing  situations. 

Notice  finally  that  the  algorithm  also  exhibits  another  type  of  adaptivity, 
namely  fault  tolerance.  This  is  a  consequence  of  the  distributed  nature  of  the 
algorithm.  Since  no  one  of  the  modules  is  more  important  than  the  others,  the 
networks  are  still  able  to  perform  under  degraded  preconditions.  It  is  possible 
to  delete  competence  modules  and  the  network  still  does  whatever  is  within  its 
remaining  capabilities.  For  example,  when  ‘put-board-in-vise’  is  deleted  or  made 
inactive,  the  network  comes  up  with  a  solution  that  does  not  involve  this  module. 
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Figure  6:  The  action  selection  behavior  can  be  made  less  adaptive  and  more  biased 
towards  ongoing  plans  by  choosing  7  and  <f>  relatively  small  in  comparison  with 
7r  as  in  the  first  experiment.  After  module  one  had  been  active,  we  added  the 
goal  w.  Although  there  are  less  modules  required  to  achieve  this  goal,  the  system 
continues  working  on  goal  /.  In  the  second  experiment,  the  system  is  less  biased 
towards  ongoing  goals,  because  7  and  <f>  are  relatively  high  in  comparison  with  it. 
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Figure  7:  A  toy  network  to  test  horizontal  bias. 

5.4  Bias  to  Ongoing  Plans 

The  algorithm  demonstrates  an  implicit  bias  mechanism.  It  favors  modules  that 
contribute  to  the  ongoing  goal  and  subgoals  except  when  there  is  enough  urge  to 
start  working  on  something  different.  The  main  reason  bias  is  exhibited  is  that 
the  activation  levels  are  not  reinitialized  every  time  a  module  is  activated.  As  a 
consequence  the  history  of  past  activation  spreading  plays  a  role  in  the  selection 
of  action,  in  particular  when  the  effect  of  the  state  and  goals  is  relatively  small  in 
comparison  with  the  mean  activation  level.  But  even  if  that  is  not  the  case,  the 
algorithm  exhibits  bias  towards  ongoing  plans.  More  specifically,  it  demonstrates 
two  types  of  bias:  horizontal  and  vertical. 

1.  Horizontal  Bias 

A  first  type  of  bias  demonstrated  by  the  action  selection  algorithm  is  the  favor¬ 
ing  of  actions  that  contribute  to  the  current  goal  (the  goal  on  which  it  was  working 
before).  Given  the  set  of  modules  in  figure  7  and  an  initial  state  5(0)  =  (a,x), 
and  global  goals  (7(0)  =  (/,r).  One  to  five  are  the  competence  modules  necessary 
to  achieve  goal  /,  while  one’  to  five’  are  the  modules  that  contribute  to  goal  r. 

When  simulated  this  network  does  not  jump  back  and  forth  between  modules 
that  contribute  to  /  and  modules  that  contribute  to  r.  Instead  it  starts  working 
on  one  goal,  completes  it  and  then  works  on  the  other  goal  (cfr.  figure  8).  This  is 
the  case,  because  when  either  module  one  or  one’  is  chosen,  the  distance  of  that 
path  to  the  goals  is  shorter  than  that  of  the  other  path.  Therefore,  the  spreading 
of  activation  backwards  has  a  larger  effect  and  makes  sure  that  the  started  path  is 
finished  first.  As  the  paths  from  state  to  goals  grow  longer,  the  threshold  has  to 
be  increased  to  obtain  this  effect  (more  on  the  effect  of  the  threshold  in  the  next 
section). 

2.  Vertical  Bias 

A  second  type  of  bias  is  the  favoring  of  actions  that  contribute  to  a  ‘brother’ 
goal  (a  subgoal  of  the  same  overall  goal).  Consider  the  modules  in  figure  9.  The 
initial  state  of  the  environment  is  5(0)  =  (al,cl,el,^l,a2,c2,e2,^2),  the  goals  are 
(7(0)  =  (kl,k2). 

Again,  if  the  threshold  is  high  enough,  this  network  first  executes  all  the  actions 
that  contribute  to  one  goal  and  then  starts  working  on  the  other  goal  (cfr.  figure 
10).  The  reason  is  that  once  a  predecessor  of  a  module  has  been  active,  the  node 
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Figure  8:  When  the  threshold  is  high  enough,  the  action  selection  behavior  ex¬ 
hibits  a  horizontal  bias  (left-hand  experiment).  When  the  threshold  is  not  high 
enough,  the  system  jumps  between  modules  contributing  to  one  goal  and  modules 
contributing  to  the  second  goal  (right  hand  experiment). 


33 


Figure  9:  A  toy  network  to  test  vertical  bias. 

itself  receives  more  activation  energy  from  the  state  of  the  environment.  Therefore 
it  has  more  activation  to  spread  to  its  remaining  predecessors. 

As  already  stated  in  the  previous  section,  the  system  can  be  given  a  higher 
or  lesser  degree  of  ‘inertia’  with  respect  to  the  changing  environment  and  goals 
by  selecting  the  ratio  of  the  global  parameters  appropriately.  Especially  in  very 
dynamic  environments,  it  might  be  necessary  to  make  the  system  adapt  slower, 
otherwise  it  might  never  get  anything  done. 

5.5  Avoiding  Goal  Conflicts 

A  bad  ordering  of  actions  can  dramatically  increase  the  number  of  actions  necessary 
to  achieve  a  goal,  or  even  prevent  a  solution  from  ever  being  found.  Any  action 
selection  algorithm  should  therefore  to  some  degree  be  able  to  arbitrate  among 
conflicting  actions.  Our  algorithm  is  able  to  do  so  because  of  the  inhibition  rules. 
The  modules  in  a  network  that  undo  a  protected  goal  are  weakened  by  a  factor 
of  S.  US  is  large  enough  (in  particular  in  relation  to  7  and  <f>),  this  results  in  an 
action  selection  that  protects  global  goals. 

The  same  is  true  for  subgoals  (or  preconditions  of  modules).  Every  module 
decreases  the  activation  level  of  modules  that  undo  its  true  conditions.  Again  this 
results  in  an  action  selection  behavior  in  which  ‘subgoals’  are  protected  and  thereby 
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Figure  10:  When  the  threshold  is  high  enough,  the  action  selection  behavior  ex¬ 
hibits  vertical  bias  (left-hand  experiment).  When  the  threshold  is  not  high  enough, 
the  system  jumps  between  modules  contributing  to  the  first  goal  and  modules  con¬ 
tributing  to  the  second  goal  (right  hand  experiment). 
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Figure  11:  The  classical  conflicting  goals  example.  The  initial  state  of  the  world  is 
S(0)=(clear-a,  clear-b,  a-on-c),  the  goals  are  G(0)=(a-on-b,  b-on-c).  The  system 
should  first  achieve  the  goal  b-on-c  and  then  the  goal  a-on-b.  It  is  tempted  however 
to  immediately  stack  a  onto  b,  which  may  bring  it  in  a  deadlock  situation  (not 
wanting  to  undo  the  already  achieved  goal). 
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Figure  12:  Some  of  the  modules  involved  in  the  blocks  world  domain. 

goal  conflicts  are  avoided.  To  illustrate  how  this  happens,  we  reimplemented  the 
classical  anomalous  situation  example  from  the  blocks  world  (Sussman,  1975). 
Figure  11  illustrates  the  problem.  Figure  12  shows  some  of  the  competence  modules 
involved  in  this  example. 

Figure  13  and  14  show  the  results  obtained.  In  the  first  experiment  S  has  the 
same  value  as  7  which  is  far  greater  than  <f> .  The  result  is  that  the  inhibition  of 
‘stack-a-on-b’  by  ‘stack-b-on-c’  for  condition  ‘clear-b’  is  far  more  important  than 
its  activation  by  the  state.  Because  of  this,  the  module  ‘take-a-from-b’  dominates 
over  ‘stack-a-on-b’,  despite  the  fact  that  the  latter  one  achieves  a  goal.  If  6  is 
not  high  enough  (as  in  the  second  experiment),  the  urge  to  fulfill  the  goal  ‘a-on- 
b’  dominates  over  the  urge  to  avoid  ‘clear-b’,  so  that  the  system  does  start  by 
stacking  a  on  b.  It  is  however  still  able  to  restore  the  situation  and  obtain  the  two 
goals,  since  the  influence  from  the  protected  goals  is  not  high  enough  to  keep  the 
system  from  undoing  the  achieved  goal  ‘a-on-b’.  Again,  a  balance  has  to  be  found 
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Figure  13:  When  the  influence  from  protected  goals  and  the  threshold  are  high 
enough,  the  system  is  able  to  avoid  problems  with  conflicting  goals. 

between  not  caring  about  goal  conflicts  at  all  and  being  so  rigid  as  to  never  undo 
an  achieved  (sub-)  goal,  thereby  risking  deadlocks. 

5.6  Thoughtfulness 

A  network  only  looks  ahead  in  a  local  neighborhood  (in  time)  which  is  determined 
by  the  threshold  0.  The  behavior  can  be  made  more  or  less  thoughtful  by  increasing 
the  threshold  0.  This  makes  the  spreading  activation  process  go  on  for  a  longer 
time  before  a  specific  action  is  selected.  As  such,  it  allows  the  network  to  look 
ahead  further,  thereby  avoiding  local  maxima  (in  time)  of  activation  levels.  For 
example,  in  the  blocks- world  example  above,  the  module  ‘stack- a-on-b’  initially 
has  the  highest  activation  level  (since  it  receives  direct  input  from  both  the  state 
and  the  goals).  The  threshold  has  to  be  put  high  enough  to  avoid  that  this  module 
is  chosen  right  away,  so  that  the  network  can  go  on  taking  into  account  the  conflicts 
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Figure  14:  In  both  these  experiments  the  system  reacts  opportunistically,  not 
taking  into  account  conflicting  goals.  In  the  first  experiment,  the  parameter  7 
is  low,  so  that  the  system  is  not  very  sensitive  to  goal-conflicts.  In  the  second 
experiment,  the  threshold  is  not  high  enough,  so  that  the  system  chooses  a  local 


maximum 


among  modules. 

Ideally,  we  would  like  to  set  the  threshold  to  a  very  high  value  (for  example 
equal  to  the  total  activation  of  the  whole  network).  This  would  guaxantee  that 
the  spreading  activation  process  goes  on  long  enough  so  that  the  ‘optimal’  action 
can  be  selected.  The  problems  with  putting  the  threshold  high  are  first,  that 
the  action  selection  process  would  require  too  much  time  (especially  for  an  agent 
operating  in  a  rapidly  changing  environment)  and  second,  that  the  result  would 
be  that  the  agent  would  get  bogged  down  trying  to  take  into  account  the  effects  of 
actions  it  might  take  in  the  far  future.  This  is  most  probably  a  wasted  effort  in  an 
unpredictable  environment.  Therefore  we  do  want  the  agent  only  to  look  ahead  to 
the  near  future.  The  desired  amount  of  looking  ahead  for  a  particular  application 
can  be  obtained  by  choosing  a  proper  value  for  the  threshold. 

5.7  Speed 

The  counterpart  of  thoughtfulness  is  speed.  The  action  selection  behavior  can  be 
made  faster  by  varying  the  threshold  0  as  explained  above.  The  resulting  action 
selection  is  however  less  ‘thoughtful’,  which  means  that  it  is  less  goal-oriented,  less 
situation  oriented,  that  it  takes  conflicting  goals  less  into  account  and  that  it  is 
less  biased  towards  ongoing  plans.  Nevertheless,  it  may  sometimes  be  important 
to  react  fast  or  it  may  be  a  wasted  effort  to  be  very  thoughtful  (i.e.,  make  a  lot  of 
plans  and  predictions). 

Fortunately,  the  algorithm  is  not  complex,  so  that  it  allows  speed  to  be  ob¬ 
tained  without  sacrificing  too  much  thoughtfulness.  The  algorithm  does  however 
perform  some  sort  of  ‘search’  through  a  network  from  goal  modules  to  executable 
modules,  so  one  could  argue  that  the  algorithm  suffers  from  the  same  problems  as 
traditional  AI  search.  More  specifically,  that  the  efficiency  necessarily  goes  down 
as  the  number  of  modules  involved  in  a  plan  grows  (the  so-called  ‘combinatorial 
explosion’  problem).  Nevertheless,  it  is  important  to  take  the  following  counterar¬ 
guments  into  consideration: 

•  The  search  that  is  going  on  here  is  of  a  very  different  nature.  Actually, 
it  resembles  marker  passing  algorithms  more  than  the  AI  notion  of  search. 
The  system  does  not  construct  a  search  tree,  not  does  it  maintain  a  current 
hypothetical  state  and  partial  plan.  In  addition,  it  evaluates  different  paths 
in  parallel,  so  that  it  does  not  have  to  start  from  scratch  when  one  path  does 
not  produce  a  solution,  but  smoothly  moves  from  one  plan  to  another.  As  a 
result,  the  computation  the  algorithm  performs  is  much  less  costly. 

•  The  system  does  not  ‘replan’  completely  at  every  timestep.  The  algorithm 
does  not  reinitialize  the  activation-levels  to  zero  whenever  an  action  has  been 
taken.  This  implies  that  it  may  take  some  time  to  select  the  first  action  to 
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execute,  but  from  then  on,  the  network  is  biased  towards  that  particular 
situation  and  set  of  goals.  This  means  that  it  will  take  much  less  time  for 
the  following  actions  to  be  selected,  in  particular  when  little  has  changed  in 
the  meantime  with  respect  to  the  goals  or  current  situation. 

•  We  believe  that  for  real  autonomous  agents  (e.g.,  mobile  robots)  the  networks 
will  grow  ‘larger’  instead  of  ‘longer’,  because  typically,  the  agent  will  have 
more  tasks/goals  instead  of  having  tasks /goals  that  require  more  actions  to 
be  taken  (and  therefore  more  ‘planning’).  Also,  large  subparts  may  exist  in 
the  network  that  appear  to  be  unconnected.  As  a  result,  the  efficiency  of  the 
system  will  not  be  affected  so  much.  Even  if  some  paths  from  state  matchers 
to  goal  achievers  would  be  very  long,  the  system  would  still  come  up  with 
an  action  because  it  does  not  await  a  convergence  in  the  activation  levels 
and  decreases  the  threshold  with  time.  The  selected  action  might  however 
be  non  optimal. 

•  The  same  simple  spreading  activation  rules  are  applied  to  each  of  the  mod¬ 
ules.  In  addition,  there  are  only  local,  fixed  links  among  modules.  This  opens 
interesting  opportunities  for  a  parallel  implementation,  which  would  imply  a 
considerable  speed  up. 

6  Discussion 

There  are  a  number  of  limits  to  the  algorithm  as  it  is  now.  The  main  ones  are 
listed  below. 

•  The  language  provided  to  describe  the  input-output  relationship  of  a  compe¬ 
tence  module  is  oversimplified.  There  is  no  way  to  work  with  abstractions, 
neither  can  variables  be  used. 

•  A  network  does  not  maintain  a  record  of  its  past  ‘search’.  As  such  the  same 
planning  mistake  can  be  made  over  and  over  again  in  the  same  plan,  making 
the  system  loop. 

•  It  is  not  yet  clear  how,  given  a  specific  application,  one  can  select  values  for 
the  global  parameters  that  produce  the  desired  action  selection  behavior. 

In  the  remainder  of  this  section  we  discuss  the  importance  of  these  limits  and 
sketch  solutions  to  those  that  represent  important  limitations.  The  proposed  solu¬ 
tions  resonate  with  the  current  philosophy  and  the  merits  it  has.  The  implemen¬ 
tation  of  these  solutions  will  be  the  main  concern  of  our  future  research. 
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6.1  Variables 


The  algorithm  does  not  incorporate  classical  variables  and  variable-passing.  As  a 
matter  of  fact,  a  lot  of  its  advantages  would  disappear  if  they  would  be  introduced. 
For  example,  one  reason  a  lot  of  search  is  eliminated  is  exactly  because  there  are 
no  variables  in  the  algorithm.  A  first  implication  of  the  absence  of  variables  is 
that  one  cannot  specify  goals  using  variables  (e.g.  goto-location(x,y)).  A  second 
implication  is  that  all  modules/operators  of  the  domain  have  to  be  instantiated 
beforehand,  which  means  that  a  node  has  to  be  created  for  every  parameter. 

We  try  to  avoid  the  need  for  variables  altogether  by  using  only  so-called  indexical- 
functional  aspects  to  describe  relevant  properties  of  the  immediate  environment 
(Agre  &  Chapman,  1987).  The  main  idea  here  is  that  internal  representations 
of  objects  in  the  environment  are  in  terms  of  the  purposes  and  circumstances  of 
the  agent.  The  module  ‘spray-paint-self’  for  example  only  has  to  be  instantiated 
with  one  parameter,  namely  ‘the-sprayer-I-am-holding-in-my-hand’.  Because  of 
this,  it  is  not  necessary  to  create  new  operators/modules  for  every  new  object  that 
is  introduced  in  the  world.  There  is  no  exhaustive  combination  of  operators  and 
objects. 

The  idea  of  indexical-functional  aspects  is  particularly  interesting  for  autonomous 
agents  because  it  does  not  make  unrealistic  assumptions  about  what  perception 
can  deliver.  In  particular,  it  does  not  demand  that  perception  can  produce  the 
identity  and  exact  location  of  objects.  The  absence  of  variables  does  constrain  the 
language  one  can  use  to  communicate  with  the  system,  but  not  in  a  too  strong 
way.  All  it  requires  is  a  new  way  of  thinking  about  how  to  tell  an  agent  what  to 
do.  More  specifically,  one  does  not  use  unique  names  of  objects  when  specifying 
goals.  Instead  goals  are  specified  in  terms  of  indexic&l  or  functional  constraints  on 
the  objects  involved.  For  example,  one  would  not  tell  the  agent  to  go  to  location 
(x,y),  but  one  would  tell  the  agent  that  the  goal  is  to  be  in  a  location  that  is  a 
doorway  (a  small  area  where  it  is  able  to  ‘go  through’  a  wall). 

6.2  Handling  Loops 

A  problem  with  the  current  algorithm  is  that  loops  in  the  action  selection  may 
emerge.  They  only  occur  very  rarely  and  spring  from  the  fact  that  the  system 
does  not  maintain  a  history  of  what  it  did  before.  It  is  questionable  whether  a 
solution  to  such  impasses  should  be  built  in.  The  hypothesis  could  be  adopted  that 
in  a  real  environment  the  state  and  goals  will  change  anyway  after  some  time  At 
that  is  very  small.  This  changes  the  spreading  activation  patterns  and  therefore 
gets  the  network  out  of  its  impasse.  If  we  insist  on  avoiding  (even  temporal) 
impasses,  this  cannot  be  guaranteed  by  a  careful  selection  of  the  parameters.  One 
very  simple  solution  however  could  be  to  introduce  some  randomness  in  the  system. 
Another  solution  might  be  to  use  a  second  network  to  monitor  possible  loops  in  the 
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first  network  and  take  actions  whenever  this  happens.  Finally,  we  could  implement 
some  habituation  mechanism  for  some  or  all  of  the  modules.  This  mechanism  would 
take  care  that  every  time  a  module  is  activated,  it  is  less  likely  to  become  active 
in  the  future  (i.e.,  have  local  thresholds  that  vary  over  time). 

6.3  Selecting  the  Parameters 

The  global  parameters  to  a  large  degree  determine  the  effectiveness  and  character¬ 
istics  of  the  action  selection  behavior.  It  is  still  an  open  question  how  the  values 
for  these  parameters  should  be  selected.  They  are  very  problem  dependent,  not 
only  because  every  problem  area  requires  different  degrees  of  goal-orientedness, 
situation-orientedness,  speed,  adaptivity,  etc.  But  also  because  the  size  and  struc¬ 
ture  of  the  network  also  determines  these  characteristics.  For  example,  in  an 
application  with  a  very  big  network,  the  threshold  has  to  be  put  higher  to  obtain 
the  same  results.  At  the  moment  we  tune  the  parameters  by  hand  during  a  series 
of  experiments.  We  plan  to  build  a  second  network  of  competence  modules  that 
would  look  at  the  results  of  the  first  one  and  tune  its  parameters  so  as  to  obtain 
the  action  selection  characteristics  specified  by  the  user. 

7  Related  Work 

The  introductory  section  already  discussed  how  this  work  relates  to  connectionism 
and  traditional  AI.  The  main  difference  with  the  former  being  that  more  structure 
and  competence  is  built  in,  the  difference  with  the  latter  being  that  classical  search 
is  avoided.  The  remainder  of  this  section  compares  this  work  to  so-called  ‘reactive 
systems’,  to  distributed  AI  and  to  other  hybrid  systems. 

7.1  Reactive  Systems 

The  approach  is  related  to  the  so-called  ‘reactive  systems’  (Georgeff  &  Lansky, 
1987)  (Firby,  1987)  (Kaelbling,  1987)  (Rosenschein  &  Kaelbling,  1987)  (Schop- 
pers,  1987)  (Agre  &  Chapman,  1987)  (Sanborn  &  Hendler,  1987).  The  emphasis  in 
these  architectures  is  on  a  more  direct  coupling  of  perception  to  action,  distributed¬ 
ness  and  decentralization,  dynamic  interaction  with  the  environment  and  inherent 
mechanisms  to  cope  with  resource  limitations  and  incomplete  knowledge.  They 
deemphasize  deliberation  (or  ‘thinking’  in  general)  and  internal  models.  The  main 
difference  between  our  algorithm  and  these  systems  is  that  we  neither  ‘prewire’  nor 
‘precompile’  the  control  flow.  The  arbitration  among  modules  is  a  run-time  process 
which  differs  according  to  the  goals  that  are  given  to  the  system  and  the  situation 
the  system  finds  itself  in.  It  therefore  constitutes  a  simpler,  more  distributed  and 
more  general  solution  to  the  problem  of  action  selection. 
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7.2  Distributed  AI 


The  difference  between  this  work  and  the  bulk  of  work  in  distributed  planning 
(Bond  &  Gasser,  1988)  (Huhns,  1987)  as  well  as  with  the  work  on  black-board 
systems  (Hayes-Roth,  1979),  is  that  in  the  latter  planning  modules  communicate 
among  themselves  on  a  much  higher  level.  They  communicate  using  a  language, 
sometimes  debate  and  negotiate  among  one  another  or  even  reason  about  each 
other.  The  problem-specific  needs  for  a  communication  language  therefore  consti¬ 
tutes  the  major  barrier  for  the  widespread  applicability  of  these  techniques.  The 
algorithm  presented  in  this  paper  makes  integration  of  different  modules  in  one  sys¬ 
tem  easier  because  the  communication  among  modules  is  reduced  to  a  minimum 
and  happens  on  an  information-scarce  level  (only  numbers  are  being  communi¬ 
cated).  Furthermore,  modules  do  not  have  to  share  a  global  internal  model  or 
global  blackboard.  They  are  said  to  communicate  ‘through  the  world’  (Brooks, 
1986). 


7.3  Hybrid  Systems 

Finally,  this  algorithm  is  related  to  some  of  the  hybrid  systems  that  have  been 
built  for  planning  and  decision  making.  Hendler  (1988)  describes  a  hybrid  system 
in  which  a  massive  parallel  component  is  used  to  provide  heuristic  information  to 
a  classical  AI  planner.  A  marker  propagating  network  guides  the  classical  planner 
towards  more  relevant  plans.  (Lehnert,  1987)  describes  a  hybrid  system  that  uses 
a  stack  and  copy  mechanism  for  control  and  numerical  relaxation  over  a  structured 
network  for  smooth  decision  making.  The  difference  with  the  algorithm  presented 
here  is  that  in  both  these  systems  the  control  is  still  hierarchical  and  centralized, 
and  might  therefore  turn  out  to  be  too  inflexible  for  use  in  autonomous  agents 
operating  in  a  dynamic  environment. 


8  Conclusions 

The  results  reported  upon  in  the  paper  demonstrate  the  feasibility  of  using  an 
activation/inhibition  dynamics  among  competence  modules  to  solve  the  problem 
of  action  selection  for  an  autonomous  agent  operating  in  a  dynamic  world.  Such  a 
scheme  has  particular  advantages  over  traditional,  deliberative  hierarchical  meth¬ 
ods.  The  price  to  pay  is  that  the  actions  selected  might  be  less  rational.  However, 
the  algorithm  provides  global  controls  which  one  can  use  to  tune  the  action  se¬ 
lection  behavior  along  several  criteria,  such  as  thoughtfulness /rationality  versus 
speed,  goal-orientedness  versus  data-orientedness,  and  adaptivity  versus  bias  to 
ongoing  goals. 
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